Customers in Europe region might experience degraded performance for Document Understanding Recommendations
Incident Report for UiPath
Postmortem

Customer impact

Between December 13, 2024, at 16:00 UTC and December 14, 2024, at 08:30 UTC, some customers with tenants hosted in the U.S. and E.U. region may have experienced errors or increased latency while using the generative AI capabilities in Document Understanding. Customers in the delayed rings were not affected.

Background context

Document Understanding leverages Azure OpenAI GPT to power features that require large language models (LLMs). UiPath partners with Azure to secure a specific capacity for these advanced AI services. However, this Azure capacity is limited. Sometimes obtaining additional resources takes time.

Root cause

Shortly before the start of the incident, we saw a large increase in usage of the Gen AI features of Document Understanding. This caused us to run out of Azure OpenAI capacity. Azure did not have additional quota in these regions, so usage was heavily throttled. This has impacted all customers in these regions.

Detection

Our monitors found the problem and told our engineers in a few minutes.

Response

Once the spike in requests subsided, our systems returned to their normal state.

Follow-up

To prevent similar issues in the future and enhance our service reliability, we are implementing several key improvements:

  • Enabling Dynamic Quota System: We will deploy a dynamic allocation system. This will adjust in real-time based on current load, vendor capacity, customer requirements, and other pertinent factors. This way, we will give more resources to all customers in a fair and efficient way.
  • Increasing capacity: We have increased Azure OpenAI capacity in the affected regions.
  • Improve capacity planning: We will review our capacity management plan to ensure that we are proactively scaling Azure OpenAI in advance of demand.
Posted Dec 18, 2024 - 20:16 UTC

Resolved
This incident has been resolved.
Posted Dec 18, 2024 - 14:59 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Dec 18, 2024 - 14:30 UTC
Investigating
Customers in Europe region might experience degraded performance for Document Understanding Recommendations and team is actively investigating the issue
Posted Dec 18, 2024 - 14:00 UTC
This incident affected: Document Understanding.