[US Region] hive/hadoop job queueing delays

Incident Report for Treasure Data

Resolved

This incident was resolved earlier but remained in monitoring while we verified affected jobs. We will notify customers directly of any incomplete jobs.

Posted Apr 02, 2019 - 01:32 PDT

Monitoring

We have completed migration of jobs to a healthy cluster.

A small number of delayed hive, bulkimport, and result export jobs from the past few hours have been restarted by the job control system during the failover.

Posted Apr 01, 2019 - 21:47 PDT

Identified

We have identified an issue with a single hive job control cluster where jobs were consuming slowly, and are in the process of migrating and failing over in-progress jobs to another cluster. A small number of in-progress jobs from the past few hours will be restarted by the job control system during the failover.

Posted Apr 01, 2019 - 21:42 PDT

Investigating

We are currently investigating job queueing delays for hive/hadoop jobs.

Posted Apr 01, 2019 - 21:32 PDT

This incident affected: US (Hadoop / Hive Query Engine).