Between 2025-07-10 07:07 AM UTC and 2025-07-16 08:43 AM UTC, some API calls in the EU and US regions failed with a 503 error. Customers may have experienced this issue intermittently during that time, but refreshing the page or trying again resolved it.
The incident was caused by a bad code push in recent deployment. The issue originated when background jobs were getting initialized. Intermittently, a database call was made after the required service context had already been disposed, resulting in a "connection is closed" error. Eventually, the unhandled exception propagated, causing the pods to crash and restart. Requests that were landed to these restarting pods resulted in a 503 status.
Once the issue was identified, it was isolated, fixed, and a hotfix was deployed across all regions.
To prevent similar incidents in the future, and to detect such issues earlier, we are implementing the following measures: - Enhancing alerting mechanisms, including alerts for pod health and restarts across all environments - Introducing a more rigorous code review process to catch such issues early in the development cycle
Posted Jul 30, 2025 - 07:52 UTC
Investigating
Between 2025-07-10 07:07 AM UTC and 2025-07-16 08:43 AM UTC, some API calls in the EU and US regions failed with a 503 error. Customers may have experienced this issue intermittently during that time, but refreshing the page or trying again resolved it.
The incident was caused by a bad code push in recent deployment. The issue originated when background jobs were getting initialized. Intermittently, a database call was made after the required service context had already been disposed, resulting in a "connection is closed" error. Eventually, the unhandled exception propagated, causing the pods to crash and restart. Requests that were landed to these restarting pods resulted in a 503 status.
Once the issue was identified, it was isolated, fixed, and a hotfix was deployed across all regions.
To prevent similar incidents in the future, and to detect such issues earlier, we are implementing the following measures: - Enhancing alerting mechanisms, including alerts for pod health and restarts across all environments - Introducing a more rigorous code review process to catch such issues early in the development cycle