All Systems Operational

About This Site

This is Arm Treasure Data's status page.
We believe that trust starts with full transparency.

US ? Operational
90 days ago
99.95 % uptime
Today
Web Interface Operational
90 days ago
100.0 % uptime
Today
REST API Operational
90 days ago
99.99 % uptime
Today
Streaming Import REST API Operational
90 days ago
100.0 % uptime
Today
Mobile/Javascript REST API Operational
90 days ago
100.0 % uptime
Today
Data Connector Integrations Operational
90 days ago
100.0 % uptime
Today
Hadoop / Hive Query Engine Operational
90 days ago
100.0 % uptime
Today
Presto Query Engine Operational
90 days ago
99.99 % uptime
Today
Presto JDBC/ODBC Gateway Operational
90 days ago
99.99 % uptime
Today
Workflow ? Operational
90 days ago
100.0 % uptime
Today
CDP API ? Operational
90 days ago
99.86 % uptime
Today
CDP Personalization - Lookup API ? Operational
90 days ago
99.79 % uptime
Today
CDP Personalization - Ingest API ? Operational
90 days ago
99.79 % uptime
Today
Tokyo ? Operational
90 days ago
99.99 % uptime
Today
Web Interface Operational
90 days ago
100.0 % uptime
Today
REST API Operational
90 days ago
99.99 % uptime
Today
Streaming Import REST API Operational
90 days ago
100.0 % uptime
Today
Mobile/Javascript REST API Operational
90 days ago
100.0 % uptime
Today
Data Connector Integrations Operational
90 days ago
100.0 % uptime
Today
Hadoop / Hive Query Engine Operational
90 days ago
100.0 % uptime
Today
Presto Query Engine Operational
90 days ago
100.0 % uptime
Today
Presto JDBC/ODBC Gateway Operational
90 days ago
100.0 % uptime
Today
Workflow ? Operational
90 days ago
99.9 % uptime
Today
CDP API ? Operational
90 days ago
100.0 % uptime
Today
CDP Personalization - Lookup API ? Operational
90 days ago
100.0 % uptime
Today
CDP Personalization - Ingest API ? Operational
90 days ago
100.0 % uptime
Today
EU ? Operational
90 days ago
100.0 % uptime
Today
Web Interface Operational
90 days ago
100.0 % uptime
Today
REST API Operational
90 days ago
100.0 % uptime
Today
Streaming Import REST API Operational
90 days ago
100.0 % uptime
Today
Mobile/Javascript REST API Operational
90 days ago
100.0 % uptime
Today
Data Connector Integrations Operational
90 days ago
100.0 % uptime
Today
Hadoop / Hive Query Engine Operational
90 days ago
100.0 % uptime
Today
Presto Query Engine Operational
90 days ago
100.0 % uptime
Today
Presto JDBC/ODBC Gateway Operational
90 days ago
100.0 % uptime
Today
Workflow ? Operational
90 days ago
100.0 % uptime
Today
CDP API ? Operational
90 days ago
100.0 % uptime
Today
CDP Personalization - Lookup API ? Operational
90 days ago
100.0 % uptime
Today
CDP Personalization - Ingest API ? Operational
90 days ago
100.0 % uptime
Today
Global ? Operational
90 days ago
100.0 % uptime
Today
Reporting ? Operational
90 days ago
100.0 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
had a major outage
had a partial outage
US - REST API - Response Time ?
Fetching
US - REST API - Error Rates ?
Fetching
US - Streaming Import REST API - Response Time ?
Fetching
US - Queued Streaming Import requests ?
Fetching
US - Mobile/Javascript REST API - Response Time ?
Fetching
US - Web Interface - Response Time ?
Fetching
Tokyo - REST API - Response Time ?
Fetching
Tokyo - Streaming Import REST API - Response Time ?
Fetching
Tokyo - Queued Streaming Import requests ?
Fetching
Tokyo - Web Interface - Response Time ?
Fetching
EU - REST API - Response Time ?
Fetching
EU - Streaming Import REST API - Response Time ?
Fetching
EU - Queued Streaming Import requests ?
Fetching
EU - Web Interface - Response Time ?
Fetching
US - CDP API - Response Time
Fetching
Tokyo - CDP API - Response Time
Fetching
EU - CDP API - Response Time
Fetching
US - CDP Personalization Lookup API - Response Time
Fetching
Tokyo - CDP Personalization Lookup API - Response Time
Fetching
EU - CDP Personalization Lookup API - Response Time
Fetching
US - CDP Personalization Ingest API - Response Time
Fetching
Tokyo - CDP Personalization Ingest API - Response Time
Fetching
EU - CDP Personalization Ingest API - Response Time
Fetching
Past Incidents
Sep 18, 2019

No incidents reported today.

Sep 17, 2019
Resolved - This incident has been resolved.
Sep 17, 09:22 PDT
Monitoring - We found a problematic instance in the Presto cluster and removed it. We continue to monitor the situation.
Sep 17, 08:45 PDT
Investigating - We are experiencing presto performance degradation, and are currently investigating.
Sep 17, 07:36 PDT
Resolved - The Presto cluster is now working normally.
This incident has been resolved.
Sep 17, 05:08 PDT
Monitoring - A fix has been implemented and we are monitoring the results.
Sep 17, 04:48 PDT
Identified - We are observing cluster instability in US Region that is degrading performance in one of our Presto clusters. We are working to resolve the issue.
Sep 17, 04:27 PDT
Sep 16, 2019
Postmortem - Read details
Sep 18, 14:23 PDT
Completed - *Maintenance* and system *recovery* have been fully completed.
The scheduled database maintenance is now complete.
Sep 16, 19:49 PDT
Update - Plazma Meta database downtime by upgrade operation started at 1:32 am and completed at 1:56 am UTC. However, in system recovery operation, one of Presto cluster took time to start working, and becomes fully functional at 2:29 am UTC.
We are monitoring Presto clusters to confirm queued queries are processed as expected.
Sep 16, 19:44 PDT
Verifying - The scheduled database maintenance is complete.

We are monitoring the system closely to ensure all systems successfully complete their *recovery* and return as quickly as possible to full functionality.
Sep 16, 19:31 PDT
Update - Our database upgrade operations finished successfully, but one of Presto query engine clusters are still not operational, and our queued jobs will remain in queued state for more minutes.
Sep 16, 19:26 PDT
Update - The database upgrade operations are still in progress. We are monitoring the progress and will require additional maintenance time window.
Our maintenance window will be extended until 7:30 PM PDT (11:30 AM JST, 4:30 AM CEST).
Sep 16, 18:54 PDT
In progress - The scheduled database maintenance window is starting now.

During the maintenance and recovery, customers may experience the following:

- Streaming, Mobile, and JavaScript/Browser imports delay

Streaming import (through td-agent or fluentd) requests will continue to be accepted as usual but the requests will remain queued until after the database maintenance is complete. We expect stream import processing to be further delayed during recovery.
The same will apply to import requests from Browsers (Javascript SDK) and Mobiles (Android, iOS, and Unity SDKs).

- Jobs execution delay

All jobs (Presto, Hive, Result Export, Data Connector Integrations, Bulk Import, Export, and Partial Delete jobs submitted from Console, API, Workflow or triggered by our system according to the configured schedule) will fail and continue to retry during maintenance. During recovery, we expect jobs to begin processing slowly: within 30 minutes job processing should reach back to full throughput.

- Presto JDBC / ODBC Gateway errors

The Presto JDBC / ODBC Gateway will report errors during maintenance due to the unreachability of the Metadata database: errors will be propagated to the clients. During recovery, we expect processing of Presto JDBC / ODBC jobs to follow the same recovery pattern as all other jobs (see above).

- Console

Data Workbench and Audience Studio will incur in errors caused by failures of the underlying Master Segments, Segments, and Workflows jobs.
Sep 16, 18:01 PDT
Update - In about an hour, from 6:00 PM PDT (10:00 AM JST, 3:00 AM CEST), the maintenance window for the PlazmaDB Metadata database will commence.

During the maintenance and recovery, customers may experience the following:

- Streaming, Mobile, and JavaScript/Browser imports delay

Streaming import (through td-agent or fluentd) requests will continue to be accepted as usual but the requests will remain queued until after the database maintenance is complete. We expect stream import processing to be further delayed during recovery.
The same will apply to import requests from Browsers (Javascript SDK) and Mobiles (Android, iOS, and Unity SDKs).

- Jobs execution delay

All jobs (Presto, Hive, Result Export, Data Connector Integrations, Bulk Import, Export, and Partial Delete jobs submitted from Console, API, Workflow or triggered by our system according to the configured schedule) will fail and continue to retry during maintenance. During recovery, we expect jobs to begin processing slowly: within 30 minutes job processing should reach back to full throughput.

- Presto JDBC / ODBC Gateway errors

The Presto JDBC / ODBC Gateway will report errors during maintenance due to the unreachability of the Metadata database: errors will be propagated to the clients. During recovery, we expect processing of Presto JDBC / ODBC jobs to follow the same recovery pattern as all other jobs (see above).

- Console

Data Workbench and Audience Studio will incur in errors caused by failures of the underlying Master Segments, Segments, and Workflows jobs.

Beyond this notice, we will provide updates at the start and completion of the operation, and once the verification of the new system is completed. At that time, all systems will have returned to full functionality and this Scheduled Maintenance will be closed.
Sep 16, 17:00 PDT
Scheduled - On Monday, September 16th from 6 to 7 PM PDT (Tuesday, September 17th from 10 to 11 AM JST, September 17th from 3 to 4 AM CEST) we will be performing maintenance on the PlazmaDB Metadata database. The maintenance is necessary to upgrade the PostgreSQL database to address the performance limitations that have recently affected the TD system and have surfaced as Streaming Import visibility delays and occasional slowdowns of Queries.

This maintenance was originally scheduled for Tuesday, September 3rd but was cancelled due to the issues our testing and benchmarking had uncovered during and after the test upgrade. These issues have now been addressed.

The database will become unreachable for the duration of the maintenance procedure, which should last no longer than 20 minutes. We expect this to be followed by a recovery period of around 30 minutes during which the system will gradually reach back to full throughput.


# Impact

During the maintenance and recovery, customers may experience the following:

- Streaming, Mobile, and JavaScript/Browser imports delay

Streaming import (through td-agent or fluentd) requests will continue to be accepted as usual but the requests will remain queued until after the database maintenance is complete. We expect stream import processing to be further delayed during recovery.
The same will apply to import requests from Browsers (Javascript SDK) and Mobiles (Android, iOS, and Unity SDKs).

- Jobs execution delay

All jobs (Presto, Hive, Result Export, Data Connector Integrations, Bulk Import, Export, and Partial Delete jobs submitted from Console, API, Workflow or triggered by our system according to the configured schedule) will fail and continue to retry during maintenance. During recovery, we expect jobs to begin processing slowly: within 30 minutes job processing should reach back to full throughput.

- Presto JDBC / ODBC Gateway errors

The Presto JDBC / ODBC Gateway will report errors during maintenance due to the unreachability of the Metadata database: errors will be propagated to the clients. During recovery, we expect processing of Presto JDBC / ODBC jobs to follow the same recovery pattern as all other jobs (see above).

- Console

Data Workbench and Audience Studio will incur in errors caused by failures of the underlying Master Segments, Segments, and Workflows jobs.


# Communication

Beyond this notice, we will provide updates approximately 1 hour before the beginning of the maintenance window, at the start and completion of the operation, and once the verification is completed. At that time, all systems will have returned to full functionality and the Scheduled Maintenance will be closed.

If you have any question or concern about this upgrade, please feel free to reach out to our Support team at support@treasuredata.com.
Sep 10, 18:38 PDT
Sep 15, 2019

No incidents reported.

Sep 14, 2019

No incidents reported.

Sep 13, 2019

No incidents reported.

Sep 12, 2019

No incidents reported.

Sep 11, 2019

No incidents reported.

Sep 10, 2019

No incidents reported.

Sep 9, 2019

No incidents reported.

Sep 8, 2019

No incidents reported.

Sep 7, 2019

No incidents reported.

Sep 6, 2019

No incidents reported.

Sep 5, 2019
Postmortem - Read details
Sep 9, 18:21 PDT
Resolved - Now operating normally.

The cause of this incident was an increase in the number of records in our core database tables. It is unrelated to other recent incidents. We apologize for any inconvenience caused.
Sep 5, 02:37 PDT
Monitoring - We have made remediation to our core database and the job submission is now operational. We keep monitoring on the issues in core database.
Sep 4, 22:16 PDT
Identified - We have identified the issue with the job submission system. Initial remediation was successful and jobs should now be accepted and progress to completion. We are continuing to remediate the system
Sep 4, 21:33 PDT
Investigating - We have observed an issue with the job submission system where new jobs cannot be submitted. Team is investigating and we will update shortly.
Sep 4, 21:10 PDT
Postmortem - Read details
Sep 9, 18:08 PDT
Resolved - This streaming import incident started at 11:30 pm Sep 3 PDT by API release. It caused elevated streaming import API error rate and slow response of API. This slow response caused query engine execution delay, too. At 08:30 Sep 4 PDT, streaming API became normal. However, by streaming import request flood caused by the API incident streaming import delay lasted until 5:00 pm Sep 4 PDT. During the import delay period, Streaming import and Mobile/Javascript REST API needed up to 2 hours as visibility delay.

After streaming import delay was fixed, we kept monitoring our system. We experienced different job submission error issue described in https://status.treasuredata.com/incidents/zs55rsqkg189, however, streaming import and other components are operating normally. This incident was fixed.
Sep 5, 00:50 PDT
Update - Visibility delay has returned to normal (under 1 minute.) We are continuing to investigate some remaining storage errors, but expect these will not greatly affect our customers' experience at present.
Sep 4, 17:38 PDT
Update - Visibility delay has now decreased to an average of approximately 45 minutes and is continuing to drop as expected.
Sep 4, 15:59 PDT
Update - Visibility delay has now decreased to an average of just below 1.5 hours (approximately 80 minutes) and is maintaining a downward trend.
Sep 4, 14:58 PDT
Update - Visibility delay is approaching an average of 1.75 hours, though a brief interruption from a database failover caused a small pause in processing.
Sep 4, 14:00 PDT
Update - Our visibility delay continues to average approximately 2 hours, though we have begun to see a downward trend in the extremes.
Sep 4, 13:01 PDT
Update - Our visibility delay remains at approximately 2 hours, while recent capacity adjustments have begun to positively impact our backlog.
Sep 4, 12:00 PDT
Update - Our visibility delay is currently holding steady at approximately 2 hours while client-side Fluentd installations continue to flush their pending buffers. We have made some adjustments to the distribution of our internal capacity to attempt to accelerate our internal backlog processing further.
Sep 4, 10:58 PDT
Update - We are experiencing an increase in visibility delay to approximately 2 hours as client-side Fluentd installations flush their pending buffers.
Sep 4, 09:59 PDT
Update - We have identified the cause of import API issues and have deployed a fix. Our previous internal processing capacity increase is being used to accelerate backlog processing. Current visibility delay is approximately 1.5 hours. We will update this status page with visibiltiy delay on the hour until the incident is resolved.
Sep 4, 09:35 PDT
Update - We are working to expand import capacity, throughput has been recovered.
But the due to the backlog of streaming import, currently it takes about 1 hour from import to visible from query engines. We have been working on resolving the import delay completely.
Sep 4, 07:36 PDT
Monitoring - Our streaming import API endpoint has been restored after recovery operations, and now it’s now receiving traffic from customers as it was supposed to be. Now we’re working to adjust capacity of backend workers to process increased amount of traffic.
Sep 4, 05:35 PDT
Update - The connectivity between application and database is unstable. Investigating with old stable revision.
Sep 4, 04:58 PDT
Update - We are still investigating and implementing fix to the problem.

Currently we confirmed that:
- Ingestion throughput has decreased 80%.
- Streaming Import (including JS SDK / Mobile SDK / Postback SDK / Audit logging) is affected delaying.

We’ve still working to find out the root cause of the issues. We will update the status again as we find something.
Sep 4, 03:56 PDT
Update - A fix still continues to be being implemented.
Sep 4, 03:26 PDT
Update - A fix still continues to be being implemented.
Sep 4, 02:36 PDT
Identified - The issue has been identified and a fix is being implemented.
Sep 4, 00:16 PDT
Investigating - We are observing elevated error rate of Streaming import API and investigating the issue.
Sep 3, 23:56 PDT