We experienced an issue with Presto that affected the processing of queries starting at around 1:30 AM on May 26th PDT (17:30 PM on May 26th JST) when a new version of Presto was released.
The issue was resolved and Presto's behavior rolled back to a stable state on 4:10 AM PDT / 20:10 JST on May 26th.
During that time, all queries containing a WHERE clause on a string
column were affected and could have produced a wrong result. Please read on for more details.
In last night's Presto release (May 26th, 1:30 AM PDT, 17:30 JST) we enabled columnar processing optimization. This feature intended to optimize the speed of scan and filter operators for a column by storing and analyzing the column values in batches instead of one by one.
Unfortunately this new mechanism introduced an issue in managing the underlying memory buffers storing the in-flight data.
This problem was specifically triggered by processing string columns, because of the particular way they are handled and stored in memory buffers/pages. When the size of column string values exceeded certain thresholds, the memory buffers began to be reused. This caused the following stages of the query processing pipeline operating to read data that was already overwritten onto, thus producing incorrect query results. This issue became more evident in the context of GROUP BYs, fact that initially prompted us to direct our investigation in that direction.
Since the columnar processing optimization was only active for those queries applying both scanning and filtering operators on the same column, all queries containing a WHERE
clause could be affected.
For example, these queries would be not affected:
SELECT
month,
COUNT(1) count
FROM (
SELECT
TD_DATE_TRUNC(‘month’, time) month
FROM
laws
)
GROUP BY
1
ORDER BY
1
SELECT
TD_DATE_TRUNC(‘month’, time) month,
COUNT(1) count
FROM
laws
GROUP BY
1
ORDER BY
1
These sample queries would be affected:
SELECT
month,
COUNT(1) count
FROM (
SELECT
TD_DATE_TRUNC(‘month’, time) month
FROM
laws
WHERE
state_code = ‘CA’
)
GROUP BY
1
ORDER BY
1
SELECT
month,
COUNT(1) count
FROM (
SELECT
TD_DATE_TRUNC(‘month’, time) month,
state_code
FROM
laws
)
WHERE
state_code = ‘CA’
GROUP BY
1
ORDER BY
1
SELECT
TD_DATE_TRUNC(‘month’, time) month,
COUNT(1) count
FROM
laws
WHERE
state_code = ‘CA’
GROUP BY
1
ORDER BY
1
Because other column types are directly mapped to native Java primitives, processing of those columns was not affected.
We realized the occurrence of this problem at around May 26th 4:10AM PDT (20:10 JST) and immediately proceeded to disabled the columnar processing optimization as a precaution. Unfortunately queries that were ran during this time (around 2 1/2 hours) and affected by this issue, produced incorrect results.
Due to the nature of the problem, it is unfortunately not possible to systematically identify which of the queries executed during this time were affected by this issue. We recommend our customers to rerun the queries that are considered important and/or have business impact. We apologize for the inconvenience this may have caused and please don't hesitate to contact our Support staff at support@treasuredata.com if you have any question.