On February 19, from 12:20 pm UTC to 2:20 pm UTC, some customers in the U.S. region experienced delays in real-time data updates on Orchestrator pages, like Jobs and Queues.
Our investigation identified a significant lag in data synchronization between primary and read-only replica databases in the U.S. region. This problem has affected some of our clients. It happened when a database maintenance runbook was being run. These runbooks are scheduled daily at the lowest regional traffic time . But a bug had previously prevented their execution, resulting in a backlog of data awaiting cleanup.
The synchronization delays were not initially detected by our alert system and were brought to our attention by customer reports
We fixed the problem by stopping the database maintenance runbook. This fixed the synchronization delay right away.
To prevent future occurrences, we have implemented measures to monitor and detect data synchronization issues proactively. Additionally, we are developing improvements to minimize the performance impact of these maintenance runbooks.