Business Apps issues related to deployment, publishing, and execution of Apps.
Incident Report for UiPath
Postmortem

Background context

UiPath replicates authorization data from Orchestrator and Data Service into a centralized authorization store as a building block for a unified administration experience across UiPath services, which is used in deployment, publishing and listing of UiPath Apps.

Customer impact

67 organizations in Europe and 58 organizations in our community environment experienced an increase in access denials to Apps due to this incident. These customers saw authorization errors while deploying, publishing, or executing UiPath Apps.

Please note that the regions indicate the region the organization is hosted, not where the tenant is hosted.

Root cause

During the incident, we found that the authorization data was not correctly syncronized between Orchestrator, Data Service, and the central Authorization service. This resulted in an incomplete permission set for the UiPath Apps use cases.

We believe that the data divergence was caused by a race condition within the sync process. Specifically in a tool that is designed to proactively auto-heal any issues between the various UiPath services.

Detection

Unfortunately, UiPath did not self-detect this issue. It was reported by customers, and we reported it in the UiPath status page when more than 1 customers reported this issue.

Response

The team used telemetry to see which organizations had elevated authorization failures for apps publishing, and we remediated the issue by forcing a sync of all the permission data to the centralized authorization store.

Follow up

To ensure this doesn't happen again, we plan to make many improvements to the service

  1. Improve detection by adding the necessary telemetry and monitoring to immediately catch and root cause any data de-synchronization.
  2. Review the sync architecture, and build more robustness into the sync architecture that prevents the divergence.
  3. Invest in better stress testing of the synchronization process to expose race conditions in our architecture.
Posted Aug 21, 2024 - 19:56 UTC

Resolved
This incident has been resolved.
Posted Aug 08, 2024 - 16:20 UTC
Monitoring
We have mitigated the issue and are closely monitoring it
Posted Aug 08, 2024 - 07:09 UTC
Update
Our team is actively applying the mitigation. Apps continue to be unblocked; however, the Solutions builder flows will still not able to browse and select Entities from the Data service.
Posted Aug 08, 2024 - 00:23 UTC
Update
We are continuing to work on the full mitigation.
Posted Aug 07, 2024 - 21:17 UTC
Update
We are continuing to work on a fix for solutions builder flows.
Posted Aug 07, 2024 - 19:04 UTC
Identified
A partial mitigation has been applied. Apps deployment, publishing, and executions are now unblocked. Customer who are using the Solutions builder flows will still not able to browse and select Entities from the Data service. We are working on a solution to mitigate this.
Posted Aug 07, 2024 - 19:01 UTC
Update
Our team is continuing to investigate the issue. A partial mitigation has been identified and we will provide updates when available.
Posted Aug 07, 2024 - 18:14 UTC
Investigating
Our team is investigating issues related to users ability to deploy, publish, and execute apps.
Posted Aug 07, 2024 - 17:22 UTC
This incident affected: Apps and Solutions Management.