Integration Service Outage in EU region

Incident Report for UiPath

Postmortem

Customer impact

Between April 3, 2025 at 13:32 UTC and April 4, 2025 at 01:52 UTC, a subset of customers in EU region experienced degraded service when accessing certain Integration Service metadata APIs—specifically, Connections, Connectors, and Triggers. This caused intermittent instability in the UI for impacted users. All other regions and services dependent on the Integration Service remained unaffected.

Root cause

The incident was triggered by a single database query from one tenant that, under increased traffic, became highly resource-intensive. This query consumed a significant number of database connections, saturating the connection pool. As a result, other customer requests were unable to establish new database sessions, leading to intermittent timeouts and degraded performance when accessing metadata APIs.

Detection

The issue was not detected by our service-specific alerting systems. It was first brought to our attention through our Site Reliability team, globally monitoring our Platform Gateway status.

Response

Once identified, the offending process was promptly disabled, and the system was stabilized by releasing the saturated database connections. This action restored the availability and performance of the impacted services.

Follow-up

To prevent similar incidents in the future, we are taking the following steps:

Optimizing the specific query that contributed to the database connection pool saturation, ensuring it performs efficiently under high traffic conditions.
Tuning database connection pool configurations to improve resilience and reduce the likelihood of contention during traffic spikes.
Enhancing monitoring and alerting mechanisms to proactively detect unusual query patterns and potential resource exhaustion scenarios before they impact customers.

Posted Apr 10, 2025 - 16:52 UTC

Resolved

This incident has been resolved.

Posted Apr 04, 2025 - 01:52 UTC

Update

We are aware that performance is still degraded and are working to determine the cause. Most functionality has been restored.

Posted Apr 03, 2025 - 23:29 UTC

Update

We are continuing to monitor for any further issues.

Posted Apr 03, 2025 - 22:27 UTC

Update

We are continuing to monitor the situation as functionality returns to normal

Posted Apr 03, 2025 - 21:25 UTC

Monitoring

A fix is being rolled out, functionality should gradually return to normal. We are monitoring the situation as the fix takes effect.

Posted Apr 03, 2025 - 20:23 UTC

Update

Additional complications were discovered while deploying a fix and are being addressed now

Posted Apr 03, 2025 - 18:58 UTC

Update

We are continuing to roll out a fix for the issue

Posted Apr 03, 2025 - 17:43 UTC

Update

Our engineering team has identified a fix and is currently implementing it to restore normal service.

Posted Apr 03, 2025 - 16:04 UTC

Identified

A small subset of Integration Service customers in the EU region is experiencing impact due to this issue. Our team has identified the cause and is actively working on a resolution to restore normal service as soon as possible.

Posted Apr 03, 2025 - 14:28 UTC

Investigating

We are currently investigating this issue.

Posted Apr 03, 2025 - 13:32 UTC

This incident affected: Integration Service.