What happened?
At 16:36 21/03/24 abnormally large load caused our download service to go down. Due to this service being relied upon by a number of other key services, this caused a cascade of errors which eventually overloaded our database taking down all services at around 16:56.
Upon investigating the issue we immediately triggered a failover of the database which brought back all services at 17:16.
What are we doing about this?
We are currently in the process of migrating our download service to a new more scalable, fault tolerant and optimised service. In the case of large download loads, this new service will be able to scale up to meet requirements and in the case of error won’t have the same knock on affect. This change is scheduled within the next month.