Downtime due to DigitalOcean server maintenance + related complications

Major incident Status Pages Admin Site
2019-07-03 16:00 CET · 1 hour, 15 minutes

Updates

Retroactive

At around 12:00 UTC DigitalOcean performed an unannounced maintenance to our server due to parent issues, after at least half an hour they brought the server back up.

Right after our droplet booted up we realized there were now other issues, fearing it could be a corrupted disk we performed checks that took at least 20 minutes.

After verifying the disk was OK we continue to debug until we found the culprit was the 3rd party metrics pulling service, after creating a hot-fix release and deploying it with this service disabled we were able to restore our systems to fully operational.

After this incident we’ve gathered several lessons we’ll use to improve our systems and prevent similar outages in the future, we already have a fix for the 3rd party metrics pulling service which we’ll deploy soon.

Apologies for the inconvenience.

November 4, 2021 · 16:24 CET

← Back