Presto query performance degradation

Incident Report for Treasure Data

Resolved

The cluster is stable over 15 minutes. The incident was resolved.

From 7:00 am to 9:30 am PST, a part of queries in one of our Presto clusters showed degraded performance due to internal Presto worker problem. Those queries were retried multiple times. In case you observe a Presto query error by exceeding retry count please rerun the query.

Posted Jan 11, 2018 - 09:48 PST

Monitoring

We confirmed that the worker restart made the Presto cluster stable. Queued queries are being processed smoothly. Just in case we provisioned additional computing capacity.

Posted Jan 11, 2018 - 09:36 PST

Identified

We identified a Presto error that was causing the performance degradation. We recovered by restarting Presto worker nodes.

Posted Jan 11, 2018 - 09:28 PST

Investigating

We are investigating query delays in presto clusters.

Posted Jan 11, 2018 - 08:30 PST

This incident affected: US (Presto Query Engine).