Jump to content

Recommended Posts

Posted

Dear users,

 

as you may have recognized - we are up and running again!

The site is fully functional & we are very sorry for the downtime over the weekend!

Backgrounds

On Saturday evening we had an issue in the backends of the web components, leading to Eurobricks not being browsable. As I was in a bit of a rush that evening I decided to "give the whole server a restart" as the simple fix - the little did I know that this decision led to the server not cleanly restarting but shutting down and me having no access to restart it again.

Conclusion

To conclude, Murphy told us a lesson in "what may go wrong, will go wrong" - this time it was security with a multi-factor-authentication. We are working on removing those gaps for us being able to react more swiftly.

 

Best regards

Posted

Some further details (for technically interested people):
 

  1. We found an issue with the overall setup of the server in regard of maximum memory allocations of different background services (database, search-engine etc) which led to the server being at its upper limit. Basically too much memory was given for the background processes & the PHP workers (they deliver to get the pages for the users in the browser) were not able to get their needs for memory granted. This led to a so called "segmentation fault" & in consequence the site got unavailable on Saturday.
    We addressed the issue with setting some less aggressive memory allocations for leaving more free RAM, this should fix the source of the issue I have seen over the last month already.
  2. I wanted to address the issue on Saturday evening with a server reboot to get the complete memory allocations into a clean state as some kind of quick fix - unfortunately the server did not come back online from my restart attempt. This morning we found out the server was in a shut down state. From my sysadmin experience this is somehow odd as I am 100% sure I gave a restart command, not a shutdown. But I am not sure if the situation addressed above was also have some weird effects to the operating system / other tasks of the restart process. I am somehow happy it was no issue from our server & operating system setup, as today we could revive the server with a restart from the hosters web interfaces. But I will keep this in mind for the future as this may be an early indicator of some hardware degration (motherboard or such) - if this happens again we will get in contact with the hoster to help us troubleshooting the matter.
  3. Jim and me found we had an issue in me not having "the right access" to the hosters web interfaces for me being able to fix the situation - multi factor authentication, so security for the win, I reckon... We fixed that as well so I can act on my own, while Jim can enjoy his vacation trips without having a mobile connection.. Lesson learned here :pir-huzzah1:

Hope this may give some backgrounds of why it took us so long for getting the site running again.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...