Issue Summary: On Feb 11 2025, our Communications Mining platform experienced user-facing errors due to an overload in our primary database within the US cluster. The database was unable to handle the volume of requests, leading to service disruptions.
Resolution: To address the issue, we took immediate action by vertically scaling our database nodes, enhancing their CPU and memory capacity. Additionally, we identified and removed several sources of load to stabilize the system while we continue our investigation. On Feb 13, while restoring these workload back, there was a brief disruption to our service but should have been resolved after a retry.
Next Steps: We recognise the ongoing challenges with primary DB and are actively working on migrating to a more robust solution.We apologise for any inconvenience caused and appreciate your understanding as we work towards a more reliable service.