[US Region] Presto Performance Degradation

Incident Report for Treasure Data

Resolved

Presto and Hadoop clusters are operating normally in the past 30 minutes. This incident was resolved.

Posted Apr 29, 2019 - 10:52 PDT

Monitoring

Impaired storage layer access recovered at 10:21 PT. Presto and Hadoop clusters recovered quickly. We are monitoring whole system.

Posted Apr 29, 2019 - 10:36 PDT

Update

We confirmed that storage layer access error is recovering. To make recovery faster we provisioned additional Presto computing resource.

Posted Apr 29, 2019 - 10:29 PDT

Update

Presto and Hadoop processing are still suffering from storage layer access issue. We are continuing to work on a fix for this issue.

Posted Apr 29, 2019 - 10:17 PDT

Identified

We identified our query engines are suffered from elevated error rate of storage layer access. Part of queries are retrying internally.

Posted Apr 29, 2019 - 09:58 PDT

Investigating

We are observing degraded Presto performance.

Posted Apr 29, 2019 - 09:42 PDT

This incident affected: US (Presto Query Engine).