Integration Service partial outage for Event Triggers in Europe

Incident Report for UiPath

Postmortem

Customer Impact

Between 2025-06-19 05:00 am UTC and 2025-06-19 08:30 pm UTC, customers in the EU region experienced degraded triggers in Integration Service — triggers were not being fired or being fired with hours of delay

 

Root Cause

The incident was caused by a sudden spike in the size of an Azure queue used by Integration Service to process trigger events. This spike originated from a single ServiceNow connection that was generating a high volume of events, which were being added to the queue every minute.

The final step in the trigger execution—sending a notification to fire the trigger—was failing due to a timeout. These failed attempts caused the same events to be re-queued repeatedly. Over time, this led to significant queue congestion, which in turn delayed or degraded the processing of other trigger events

Detection

The issue was not detected by our service-specific alerting systems. It was first brought to our attention through our help channel whether internal users and customers reported that triggers were not working for them

 

Response

Once identified, the offending connection and corresponding trigger was disabled, and the system was stabilized by clearing the queue for the offending connection and also increasing the event processing timeout (which made sure messages were processed and not requeued). This action restored the availability and performance of the triggers

 

Follow-Up Actions

To prevent similar incidents in the future, we are taking the following steps:

  • Refining the retry mechanism for events We need to configure the retry timeout and policy such that event processing delays don’t pile up the queues
  • Introducing logical partitioning of events and deferred queuing mechanisms so that huge events from a single connection or tenant do not impact other triggers.
  • Enhancing monitoring and alerting mechanisms to proactively detect unusual or degraded trigger performance
Posted Jun 23, 2025 - 05:58 UTC

Resolved

The issue affecting trigger execution in the Integration Service has been successfully resolved. The fix has been fully implemented and verified, and all systems are now operating as expected. We will continue to monitor closely to ensure continued stability. Thank you for your patience throughout the incident.
Posted Jun 19, 2025 - 20:35 UTC

Update

The implementation is still in progress, and we are seeing signs of improvement. Another update will be shared shortly.
Posted Jun 19, 2025 - 19:54 UTC

Update

The fix for the issue affecting trigger execution in the Integration Service is still in progress. Our team is actively monitoring the implementation. We'll provide another update as soon as there are further developments. Thank you for your continued patience and understanding.
Posted Jun 19, 2025 - 19:04 UTC

Update

We are still actively working on the fix. Our team is making steady progress and remains focused on resolving the issue as quickly as possible. We'll provide another update soon.
Posted Jun 19, 2025 - 18:09 UTC

Update

Our team is continuing to work on the fix for the issue. While the root cause has been identified, resolution efforts are still in progress. We'll share another update as soon as more information is available. Thank you for your continued patience.
Posted Jun 19, 2025 - 17:15 UTC

Identified

Our team has identified the root cause of the issue and is actively working on implementing a fix. We are closely monitoring the situation and will provide updates as progress is made. We appreciate your patience and understanding.
Posted Jun 19, 2025 - 16:18 UTC

Investigating

We are currently experiencing a partial outage affecting trigger execution in the Integration Service for users in the Europe region. Our team is investigating and working on a resolution. Thank you for your patience.
Posted Jun 19, 2025 - 15:23 UTC
This incident affected: Integration Service.