Automation Cloud - Users facing difficulty logging in.
Incident Report for UiPath
Postmortem

Background Context

UiPath uses feature flags to control the rollout of features separately from the rollout of new versions of our code. This gives us more control for canarying, testing, dogfooding, slow rollouts, and more.

Currently, we are migrating from one feature flag system to another.

Customer impact

At 22:23:49 UTC on Tuesday, Jun 25, 2024, customers across all regions were unable to log into the UiPath Automaiton Cloud Portal. They received the error message “Cannot find organization info.” The issue was resolved 28 minutes later.

Root cause

The feature flag migration was first deployed successfully in alpha, then staging, then community, and onwards through each of our regions. After the migration was fully rolled out, a cleanup was performed on the feature flag confiuration files. At this time a typo was accidentally introduced. Our automatic validation did not catch the problem before deployment.

Detection

The issue was detected by our automatic monitoring within minutes. The on-call engineer was notified and immediately began troubleshooting.

Response

The invalid feature flag config change was rolled back. The change was propagated to all production regions within 10 minutes.

Follow-ups:

  1. Improve automated validation of feature flag configuration. This should prevent similar issues by catching them before they are checked in.
  2. Improve feature flag config rollout to follow the same region-by-region process as code changes with a short pause after each region. This should reduce the blast radius of any errors so they can be caught and fixed in a lower environment.
Posted Jun 26, 2024 - 22:16 UTC

Resolved
This incident has been resolved.
Posted Jun 25, 2024 - 23:17 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 25, 2024 - 23:08 UTC
Update
We are continuing to investigate this issue.
Posted Jun 25, 2024 - 23:01 UTC
Investigating
We are currently investigating this issue.
Posted Jun 25, 2024 - 23:00 UTC
This incident affected: Automation Cloud.