[US Region] Treasure Workflow - Partial Outage in Workflow Service

Incident Report for Treasure Data

Resolved

Our Workflow service had an outage from 10:50 am PST on 25th Sep. 2023. From that time, the workflow requests went to pending status. We fixed the incident and deployed our fix at 2:40 pm PST on 25th Sep. 2023. During this outage time window, the customer workflows might experience some delays. After fix deployment, the Workflow service is working as normal, so the service started to resume pending workflows while handling new requests as well.

The incident has been resolved.
Posted Sep 25, 2023 - 18:33 PDT

Update

We still have 20% of pending workflows to catch up. The remaining pending workflows will be processed within 30 minutes. We are continuing to monitor for any further issues.
Posted Sep 25, 2023 - 17:22 PDT

Update

The half of pending workflows are processed without any issue. The remaining pending workflows will be processed within an hour. We are continuing to monitor for any further issues.
Posted Sep 25, 2023 - 15:57 PDT

Update

The pending workflows are resuming now, but it will be taking 1-2 hours to backfill all pending workflows. We are continuing to monitor for any further issues.
Posted Sep 25, 2023 - 15:16 PDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Sep 25, 2023 - 14:50 PDT

Identified

The issue has been identified and a fix is being implemented.
Posted Sep 25, 2023 - 13:52 PDT

Update

We have observed the issue that the workflows are pending with partial outage in service. We are currently investigating the issue.
Posted Sep 25, 2023 - 13:17 PDT

Investigating

We have observed the issue that the custom scripts execution failure. We are currently investigating the issue.
Posted Sep 25, 2023 - 12:01 PDT
This incident affected: US (Workflow).