Licensing issue causing Outage
Incident Report for UiPath
Postmortem

Background context

UiPath Automation Cloud stores the source of truth for all licenses on a centralized server. As our pricing strategy changes, new SKUs are created and added to licenses.

For some of these operations, a maintenance service is used to change licenses. This includes updating the license structure to match new SKUs.

After changes are applied, the effects are propagated to commercial cloud scale units. Changes come in the form of entitlements granted or revoked.

Customer impact

During a maintenance operation, some of the existing SKUs from licenses were accidentally removed. This caused some ​entitlements to be revoked.

As a result, some customers lost access to Automation Hub, AiCenter, Document Understanding, Communications Mining, Process Mining, and Test Manager between 2024-07-12 13:43 UTC and  2024-07-13 00:09 UTC.

Also, a few other capabilities were lost. These include real-time monitoring in Insights, dashboards in Insights, and a disconnected proxy in Orchestrator.

The impact was only on customer access to the affected services and capabilities. It did not cause any loss of resources or downtime of running workflows.

Root cause

A bug in our licensing maintenance tool caused the update operation to remove some SKUs from licenses. This caused us to lose access to some services and capabilities.

Detection

A few customers reported a loss of capability and UiPath on-call engineers were notified. They confirmed the scope of the outage and updated status.uipath.com.

Response

Our platform team re-enabled affected services, restoring access.

The licensing team re-added the missing data on licenses and fixed the to remaining capabilities that were lost.

Follow up

To prevent such a situation from happening again, we are changing the way we operate license updates.

  • Introduce a detection and prevention mechanism for unexpected entitlement changes by rolling out entitlement changes in a delayed and controlled way. After a maintenance change, many changes would accumulate. Then the changes can be observed. If unexpected changes appear, such as mass revocation of entitlements, then propagation is prevented.
  • Introduce a faster rollback capability at the source of license changes. This will work by introducing snapshots of changes and a way to revert changes done by the maintenance tool.
Posted Jul 22, 2024 - 19:11 UTC

Resolved
Due to a bug in a service used for maintenance, the licensing data for some of customers had incorrect updates. This issue led to the impacted customers losing access to some of the services. This issue was mitigated by fixing the incorrect data and restoring access to the services. A detailed analysis will be provided soon.
Posted Jul 12, 2024 - 23:46 UTC
Monitoring
Fix in has been implement and services should be restored completely. Reach out to support teams if any issues are encountered
Posted Jul 12, 2024 - 22:49 UTC
Update
We are continuing with the fix in batches and have confirmed some customers have restored functionality. Impacted customers will gradually be restored as we proceed.
Posted Jul 12, 2024 - 21:49 UTC
Update
We have identified a fix, and are applying it in batches. Customers should start seeing relief in batches
Posted Jul 12, 2024 - 20:45 UTC
Update
We are still continuing to test the fix for the issue.
Posted Jul 12, 2024 - 20:29 UTC
Update
We are continuing towards the fix for the issue and are currently testing the solution.
Posted Jul 12, 2024 - 19:26 UTC
Update
We are continuing to work on the mitigation steps .
Posted Jul 12, 2024 - 18:44 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Jul 12, 2024 - 18:15 UTC
Update
We are continuing to investigate this issue.
Posted Jul 12, 2024 - 18:07 UTC
Update
We are continuing to investigate this issue.
Posted Jul 12, 2024 - 17:53 UTC
Investigating
We are currently investigating this issue.
Posted Jul 12, 2024 - 17:52 UTC
This incident affected: Automation Hub, AI Center, Data Service, Document Understanding, Insights, Process Mining, Test Manager, and Communications Mining.