[US Region] Workflow Database maintenance
Scheduled Maintenance Report for Treasure Data
Postmortem

Workflow database maintenance was performed a day ahead of the communicated schedule maintenance period.

We know that customers and partners trust our scheduled maintenance announcements to be correct and then coordinate their own teams and members for awareness and remediation during maintenance operations. We let you down, and we apologize for doing so.

We maintain an internal calendar for planned operations, and we communicate scheduled maintenance through our status page in advance. On November 8th, when we published the initial maintenance notice, we did not verify that the calendar and the announcement were both correct. On November 20th prior to commencing the maintenance, we did not verify that the dates in the announcement were correct. We then sent a maintenance reminder 25 hours early, and commenced the maintenance operation 24 hours in advance.

Our team is improving our maintenance operation runbooks to include additional checklist items to confirm announcement dates are correct prior to commencing maintenance operations.

Posted Nov 20, 2019 - 01:41 PST

Completed
*Maintenance* and system *recovery* have been fully completed.
Scheduled session execution delay was 2 minutes at 6:20pm PST.

The Workflow database maintenance is now complete.
Posted Nov 19, 2019 - 18:39 PST
Verifying
The scheduled Workflow database maintenance is complete.

We are monitoring the system closely to ensure all systems successfully complete their *recovery* and return as quickly as possible to full functionality and throughput.
Posted Nov 19, 2019 - 18:35 PST
In progress
The scheduled Workflow database maintenance is starting now.


We expect the operation to cause approximately 3 minutes of downtime and to be followed by a *recovery* period of around 5 minutes during which the system will gradually reach back to full throughput. During the maintenance and recovery, customers may experience workflow REST API unavailability, delays with scheduled user-defined Workflow session executions, CDP Master Segments, Batch Segments, API Tokens, and Predictive Scoring creation and update failures, and delays in refresh of Master Segments, Batch Segments, and Predictive Scoring.
Posted Nov 19, 2019 - 18:01 PST
Update
In about an hour, from 18:00 PST (11:00 JST), we will start the scheduled maintenance on the Workflow database.

The maintenance is necessary to upgrade the PostgreSQL database instance to higher grade.

We expect the operation to cause approximately 3 minutes of downtime, the *maintenance* period, followed by a *recovery* period of around 5 minutes, during which the system will gradually reach back to full throughput.

During the maintenance and recovery periods, customers may experience the following:

* Workflow REST API unavailable
All Workflow REST API endpoints will be unavailable during the maintenance and will respond with a 500 error code.

* Delay in the execution of Workflow scheduled sessions
The execution of all scheduled Workflow sessions will be delayed during the *maintenance* period and remain queued. During *recovery*, we the expect sessions to begin processing slowly: within 5 minutes the processing of queued sessions should reach regular capacity and the backlog should be depleted shortly after.

* CDP Master Segments, Batch Segments, API Tokens, and Predictive Scoring creation and update unavailable
Throughout the *maintenance* period, it won’t be possible to create and update Master Segments, Segments, API Tokens, and Predictive Scoring: these functionalities rely on the Workflow functionality to execute ETL tasks. The ability to create and update will remain impaired during the *recovery* period and will be restored shortly after.

* Refresh of CDP Master Segments, Batch Segments, and Predictive Scoring refresh Refresh of CDP Master Segments, Batch Segments, and Predictive Scoring refresh all rely on the Workflow functionality to execute. As for Workflow session executions, refresh will be delayed during the *maintenance* and *recovery* periods.

Beyond this notice, we will provide updates at the start and completion of the operation and once the verification has completed: at that time, all systems will have returned to full functionality and this Scheduled Maintenance will be closed.
Posted Nov 19, 2019 - 16:58 PST
Scheduled
On Wednesday, November 20th from 18 to 18:30 PST (Thursday, November 21th from 11 to 11:30 JST) we will be performing maintenance on the Workflow database.

The maintenance is necessary to upgrade the instance the PostgreSQL database is running on.

We expect the operation to cause approximately *3 minutes of downtime*.

# Impact

The database will become unreachable for the duration of the *maintenance* procedure, expected to last for approximately 3 minutes. This will be followed by a *recovery* period of around 5 minutes during which the system will gradually reach back to full throughput.

All internal components reading/writing the Workflow Database have built-in fault tolerance that will allow them to retry the request in case of failures: when the connection encounters an error, the request is retried several times and for long enough to ensure maintenance and recovery periods are successfully completed.

For customer facing components, connection/request failures may be reflected back directly on the caller: they should be interpreted as a suggestion to retry the connection/request later, most practically after the maintenance window is closed.

During the maintenance and recovery, customers may experience the following:

* Workflow REST API unavailable
All Workflow REST API endpoints will be unavailable during the maintenance and will respond with a 500 error code.

* Delay in the execution of Workflow scheduled sessions
The execution of all scheduled Workflow sessions will be delayed during the *maintenance* period and remain queued. During *recovery*, we expect the sessions to begin processing slowly: within 5 minutes the processing of queued sessions should reach regular capacity and the backlog should be depleted shortly after.

* CDP Master Segments, Batch Segments, API Tokens, and Predictive Scoring creation and update unavailable
Throughout the *maintenance* period, it won’t be possible to create and update Master Segments, Segments, API Tokens, and Predictive Scoring: these functionalities rely on the Workflow functionality to execute ETL tasks. The ability to create and update will remain impaired during the *recovery* period and will be restored shortly after.

* Refresh of CDP Master Segments, Batch Segments, and Predictive Scoring refresh Refresh of CDP Master Segments, Batch Segments, and Predictive Scoring refresh all rely on the Workflow functionality to execute. As for Workflow session executions, refresh will be delayed during the *maintenance* and *recovery* periods.

# Communication

Beyond this notice, we will provide updates approximately 1 hour before the beginning of the maintenance window, at the start and completion of the operation, and once the verification is completed. At that time, all systems will have returned to full functionality and the Scheduled Maintenance will be closed.

If you have any question or concern about this upgrade, please feel free to reach out to our Support team at support@treasuredata.com.
Posted Nov 08, 2019 - 10:33 PST
This scheduled maintenance affected: US (Web Interface, REST API, Workflow, CDP API).