Orchestrator services for the Delayed Enterprise Ring (GxP) is facing major outage due to which other dependent services are also facing degraded functionality

Incident Report for UiPath

Postmortem

Customer impact

On March 28, 2025, between 10:55–12:18 UTC Orchestrator service was down in the delayed Europe and U.S. regions. No traffic was routed to the Orchestrator during this timeframe. All other services that rely on Orchestrator were affected.

Root cause

Nginx is an open-source networking tool. UiPath uses it to route traffic. We were preparing to update the version of Nginx to fix a critical issue.

UiPath’s Nginx configuration is heavily customized. Unfortunately, these settings were missing due to a bug in our deployment pipeline. As a result, the deployment was successful, but Nginx was left in an unhealthy state. It could not route traffic to Orchestrator.

Detection

Our alerts immediately detected that traffic was not routing correctly. An engineer was paged and began investigating the outage.

The logs showed that Nginx was running, but it was unhealthy. This quickly pointed the engineer to the misconfiguration.

Response

The first thing we did was fix the settings by hand and start the deployment again. The update worked, but traffic was still not reaching our service.

After additional investigation, we determined that we also needed to manually delete and recreate the misconfigured objects. Once this was done, traffic began routing correctly. Customers were able to use Orchestrator again.

Follow-up

  • We are still working on the full internal postmortem. As this is completed, we will identify additional action items and add them here.
  • We will update the Nginx deployment to make it easier to review the changes. This should prevent the missing custom configuration.
  • We will review the way that Nginx deployments are tested in non-production environments. This should ensure these sorts of mistakes are caught before they are visible to customers.
  • When deploying Nginx into a region we will first send a small percent of traffic to the new version. This should allow us to quickly detect any issues that do make it to production.
Posted Mar 30, 2025 - 01:57 UTC

Resolved

This incident has been resolved. Thank you for your patience.
Posted Mar 28, 2025 - 12:38 UTC

Monitoring

A fix has been implemented and we are actively monitoring the situation.
Posted Mar 28, 2025 - 12:26 UTC

Investigating

We are actively investigating the issue to restore services as quickly as possible.
Posted Mar 28, 2025 - 11:38 UTC
This incident affected: Orchestrator.