Identified - We have found the root cause. We've enabled the optimization option to further optimize the processing of columns, however it has race conditions that could cause wrong results or error. Not all the jobs were affected, and we are still investigating which jobs were affected during 1:30AM-4:10AM PDT (17:30-20:10 JST).

We're sorry for this inconvenience.
May 26, 07:04 PDT
Investigating - During 1:30am - 4:10am PDT (17:30-20:10 JST), we found that some queries by Presto returned wrong results or error due to new Presto release. We're investigating the cause and affected jobs.
May 26, 05:01 PDT

About This Site

This is Treasure Data's status page. We believe that trust starts with full transparency.

Web Interface Operational
REST API Operational
REST API (for td-agent) Operational
REST API (for JavaScript, Mobile SDK) Operational
Presto Operational
Operational
Degraded Performance
Partial Outage
Major Outage
System Metrics Month Week Day
REST API Response Time ?
Fetching
REST API (for td-agent) Response Time ?
Fetching
REST API (for JavaScript, Mobile SDK) Response Time ?
Fetching
REST API Error Rates ?
Fetching
Web Interface Response Time ?
Fetching
Number of Queued Imports ?
Fetching
Past Incidents
May 25, 2016
Resolved - Presto cluster is operating normally after the restart at 3:50 PDT. This problem was resolved.
We will contact to each user via e-mail with Presto job IDs that were failed due to this problem.
May 25, 05:21 PDT
Monitoring - At 3:50 am PDT our Presto server process crashed and restarted by insufficient memory. While the coordinator restarting some Presto queries failed. Please retry the failed queries. We're very sorry for inconvenience this has caused.
May 25, 05:07 PDT
May 24, 2016

No incidents reported.

May 23, 2016

No incidents reported.

May 22, 2016

No incidents reported.

May 21, 2016

No incidents reported.

May 20, 2016

No incidents reported.

May 19, 2016

No incidents reported.

May 18, 2016

No incidents reported.

May 17, 2016

No incidents reported.

May 16, 2016

No incidents reported.

May 15, 2016

No incidents reported.

May 14, 2016

No incidents reported.

May 13, 2016

No incidents reported.

May 12, 2016
Resolved - This incident is resolved. We increased number of partitions of import queues so that our infrastructure can handle larger number of import requests.
May 12, 16:44 PDT
Monitoring - We've just deployed the fix to production system. The average delay time is 45 minutes at present.
May 12, 10:09 PDT
Update - We keep evaluating the streaming import delay fix to make sure it works fine on production system. The average delay time is 20 minutes at present.
May 12, 08:17 PDT
Identified - We're observing import delay by degraded backend DB performance. Now we're evaluating the fix.
May 12, 07:28 PDT