[US Region] Presto Cluster Stability Issue
Incident Report for Treasure Data
Resolved
Clusters are back to normal state.
Posted May 16, 2018 - 01:40 PDT
Update
Presto clusters are gradually recovering to a normal state. We will keep monitoring.
Posted May 16, 2018 - 01:27 PDT
Monitoring
We have increased the capacity of Presto clusters. Keep monitoring
Posted May 16, 2018 - 01:20 PDT
Identified
We have distributed the query workload to mitigate the impact of memory consuming queries.
Posted May 16, 2018 - 01:06 PDT
Update
We detected one of our Presto clusters is having frequent major GCs. We are now working on the fix.
Posted May 16, 2018 - 00:51 PDT
Investigating
We have detected an increase of internal failure rate of Presto queries. Investigating.
Posted May 16, 2018 - 00:25 PDT
This incident affected: US (Presto Query Engine).