hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-14660) ArrayIndexOutOfBoundsException on delete
Date Wed, 24 Jun 2020 00:27:02 GMT

     [ https://issues.apache.org/jira/browse/HIVE-14660?focusedWorklogId=450129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450129
]

ASF GitHub Bot logged work on HIVE-14660:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Jun/20 00:27
            Start Date: 24/Jun/20 00:27
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] closed pull request #299:
URL: https://github.com/apache/hive/pull/299


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 450129)
    Time Spent: 20m  (was: 10m)

> ArrayIndexOutOfBoundsException on delete
> ----------------------------------------
>
>                 Key: HIVE-14660
>                 URL: https://issues.apache.org/jira/browse/HIVE-14660
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor, Transactions
>    Affects Versions: 1.2.1
>            Reporter: Benjamin BONNET
>            Assignee: Benjamin BONNET
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-14660.1-banch-1.2.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi,
> DELETE on an ACID table may fail on an ArrayIndexOutOfBoundsException.
> That bug occurs at Reduce phase when there are less reducers than the number of the table
buckets.
> In order to reproduce, create a simple ACID table :
> {code:sql}
> CREATE TABLE test (`cle` bigint,`valeur` string)
>  PARTITIONED BY (`annee` string)
>  CLUSTERED BY (cle) INTO 5 BUCKETS
>  TBLPROPERTIES ('transactional'='true');
> {code}
> Populate it with lines distributed among all buckets, with random values and a few partitions.
> Force the Reducers to be less than the buckets :
> {code:sql}
> set mapred.reduce.tasks=1;
> {code}
> Then execute a delete that will remove many lines from all the buckets.
> {code:sql}
> DELETE FROM test WHERE valeur<'some_value';
> {code}
> Then you will get an ArrayIndexOutOfBoundsException :
> {code}
> 2016-08-22 21:21:02,500 [FATAL] [TezChild] |tez.ReduceRecordSource|: org.apache.hadoop.hive.ql.metadata.HiveException:
Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":119,"bucketid":0,"rowid":0}},"value":{"_col0":"4"}}
>         at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
>         at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
>         at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:252)
>         at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
>         at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
>         at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
>         at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:769)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
>         at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>         at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
>         ... 17 more
> {code}
> Adding logs into FileSinkOperator, one sees the operator deals with buckets 0, 1, 2,
3, 4, then 0 again and it fails at line 769 : actually each time you switch bucket, you move
forwards in a 5 (number of buckets) elements array. So when you get bucket 0 for the second
time, you get out of the array...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message