All Systems Operational

About This Site

This is Treasure Data's status page. We believe that trust starts with full transparency.

Web Interface Operational
REST API Operational
REST API (for td-agent) Operational
REST API (for JavaScript, Mobile SDK) Operational
Presto Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
System Metrics Month Week Day
REST API Response Time ?
Fetching
REST API (for td-agent) Response Time ?
Fetching
REST API (for JavaScript, Mobile SDK) Response Time ?
Fetching
REST API Error Rates ?
Fetching
Web Interface Response Time ?
Fetching
Number of Queued Imports ?
Fetching
Past Incidents
Sep 30, 2016

No incidents reported today.

Sep 29, 2016

No incidents reported.

Sep 28, 2016
Postmortem - Read details
Sep 28, 18:24 PDT
Resolved - This problem has been resolved.
Sep 28, 01:25 PDT
Update - We keep observing hiccups at Backend DB connection. We added API server resources for load balancing and keep monitoring.
Sep 27, 23:29 PDT
Monitoring - From 21:51 to 22:05 PDT API servers could not respond in time due to network issue at backend DB server. The network issue has already resolved and we keep monitoring all of our system.
Sep 27, 22:19 PDT
Sep 26, 2016
Resolved - The incident has been resolved.
Sep 26, 12:15 PDT
Monitoring - Queries are now being processed gradually. We will continue monitoring.
Sep 26, 09:59 PDT
Identified - We have identified the cause; Presto queries that require large amount of memory have been blocked because of the increase of memory consuming queries. This type of queries often include count(distinct x), order by, UNION (duplicate elimination), etc. These operations cannot be distributed well and end up consuming single node memory. For the cluster stability, we are limiting the number of such memory consuming jobs that can run at the same time.

If you have noticed any delay of query execution, please refer to the following guidelines for reducing the memory usage:
https://docs.treasuredata.com/articles/presto-query-faq#q-exceeded-max-local-memory-xxgb-error
Sep 26, 09:25 PDT
Investigating - Some presto queries are queued for a long time. We are investigating the cause.
Sep 26, 09:03 PDT
Resolved - This issue was resolved.
Sep 26, 06:13 PDT
Monitoring - According to AWS status page, the network issue was already resolved. And our services recovered. We're monitoring them.
Sep 26, 06:00 PDT
Identified - Our API servers were affected by "AWS Internet Connectivity (N. Virginia)" issue.
Sep 26, 05:59 PDT
Investigating - We detected that our API servers are unstable. We're investigating it now.
Sep 26, 05:09 PDT
Sep 25, 2016

No incidents reported.

Sep 24, 2016

No incidents reported.

Sep 23, 2016

No incidents reported.

Sep 22, 2016

No incidents reported.

Sep 21, 2016

No incidents reported.

Sep 20, 2016
Resolved - All systems are operating normally after the previous update. At 20:34 PDT we started observing elevated API error rate caused by backend RDB connection problem. We rolled out a fix to mitigate the impact of connection problem and everything works fine after that. We'll investigate more of the RDB connection problem to prevent the similar problem happens again.
This incident was resolved.
Sep 20, 22:27 PDT
Update - Streaming import delay was resolved at 21:25 PDT. We keep monitoring API servers for a while.
Sep 20, 21:54 PDT
Monitoring - From 20:34 PDT error rate of API was high and streaming import and td command API call suffered from this incident. We identified the cause and rolled out the fix and API becomes normal at 21:00. We're still observing max 10 minutes of streaming import delay but it's getting resolved quickly. We keep monitoring for a while.
Sep 20, 21:22 PDT
Investigating - We're observing elevated API error rate. Now investigating.
Sep 20, 21:08 PDT
Sep 19, 2016

No incidents reported.

Sep 18, 2016

No incidents reported.

Sep 17, 2016

No incidents reported.

Sep 16, 2016

No incidents reported.