On Wednesday, October 23rd starting at 18:15 PST (Thursday, October 24th starting at 10:15 JST), Treasure Data will be performing an upgrade of the main database for the US region.
The maintenance is necessary to upgrade the database storage to withstand future usage growth and it’s part of the remediation activities discussed in the postmortem of this incident https://status.treasuredata.com/incidents/zs55rsqkg189
NOTE: This maintenance notice only interests customers using the US region and not customers using the Tokyo or EU regions.
The database operation will take at most 15 minutes: we refer to this as maintenance phase. The systems directly connecting to the database, our REST APIs (https://api.treasuredata.com
) and Web Interface, will be majorly affected and be unreachable during this phase.
The maintenance will be followed by a recovery phase of at most 15 minutes. During this phase, the REST APIs and Web Interface will gradually return to their typical performance levels.
Below is a summary of the impact customers will observe:
* The REST API will become unreachable and respond with error codes 500 or similar. This will prevent all primary actions from occurring: for example, read/write/update/delete of databases, tables, scheduled and saved queries, data connector sources, and users, creation/submission of Presto and Hive queries and Data connector jobs.
* The Web Interface will not be fully functional. Similar impact as per the point above.
* The td command-line (CLI) commands will either fail (read requests) returning errors to the user or be delayed (write requests) until the maintenance is complete.
* Streaming import requests will fail: where fluentd / td-agent is being used (as recommended), event collection will continue locally on each device/server and will recover automatically once the maintenance is complete thanks to the built-in buffering and retry mechanisms.
* The execution of scheduled queries and connector jobs will be delayed. Already executing scheduled jobs will be completed or retried internally until they are. The jobs retrying mechanism may cause the execution of the jobs to last longer than expected and 15 minutes in the worst case.
* Workflows using Treasure Data operators (e.g. td>) will retry and regain full functionalities again after the upgrade. In case of workflow sessions failures, the customer can elect to resume them manually.
* The Presto JDBC / ODBC Gateway will report authentication failures to the clients (ODBC and JDBC clients and tools/services using them).
Beyond this notice, we will provide updates approximately 1 hour before the beginning of the upgrade window, at the start and completion of the operation, and once the verification is completed. At that time, all systems will have returned to full functionality and the Scheduled Maintenance will be closed.
If you have any question or concern about this upgrade, please feel free to reach out to our Support team at firstname.lastname@example.org