hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-21496) Automatic sizing of unordered buffer can overflow
Date Mon, 25 Mar 2019 23:25:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jesus Camacho Rodriguez updated HIVE-21496:
-------------------------------------------
    Attachment:     (was: HIVE-21496.patch)

> Automatic sizing of unordered buffer can overflow
> -------------------------------------------------
>
>                 Key: HIVE-21496
>                 URL: https://issues.apache.org/jira/browse/HIVE-21496
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer
>    Affects Versions: 4.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>         Attachments: HIVE-21496.01.patch, hive.log
>
>
> HIVE-21329 added automatic sizing of tez unordered partitioned KV buffer based on group
by statistics. However, some corner cases for group by statistics sets Long.MAX for data size.
This ends up setting Integer.MAX for unordered KV buffer size. This buffer size is expected
to be in MB. Converting Integer.MAX value from MB to bytes will overflow and following exception
is thrown.
> {code:java}
> 2019-03-23T01:35:17,760 INFO [Dispatcher thread {Central}] HistoryEventHandler.criticalEvents:
[HISTORY][DAG:dag_1553330105749_0001_1][Event:TASK_ATTEMPT_FINISHED]: vertexName=Map 1, taskAttemptId=attempt_1553330105749_0001_1_00_000000_0,
creationTime=1553330117468, allocationTime=1553330117524, startTime=1553330117562, finishTime=1553330117755,
timeTaken=193, status=FAILED, taskFailureType=NON_FATAL, errorEnum=FRAMEWORK_ERROR, diagnostics=Error:
Error while running task ( failure ) : attempt_1553330105749_0001_1_00_000000_0:java.lang.IllegalArgumentException
> at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
> at org.apache.tez.runtime.common.resources.MemoryDistributor.registerRequest(MemoryDistributor.java:177)
> at org.apache.tez.runtime.common.resources.MemoryDistributor.requestMemory(MemoryDistributor.java:110)
> at org.apache.tez.runtime.api.impl.TezTaskContextImpl.requestInitialMemory(TezTaskContextImpl.java:214)
> at org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput.initialize(UnorderedPartitionedKVOutput.java:76)
> at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:537)
> at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:520)
> at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$InitializeOutputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:505)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> Stats for GBY operator is getting Long.MAX_VALUE as seen below
> {code:java}
> 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
[0] STATS-TS[0] (logs): numRows: 1795 dataSize: 4443078 basicStatsState: PARTIAL colStatsState:
NONE colStats: {severity= colName: severity colType: string countDistincts: 359 numNulls:
89 avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
> 2019-03-23T01:35:16,466 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
Estimating row count for GenericUDFOPEqual(Column[severity], Const string ERROR) Original
num rows: 1795 New num rows: 5
> 2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
[1] STATS-FIL[8]: numRows: 5 dataSize: 12376 basicStatsState: PARTIAL colStatsState: NONE
colStats: {severity= colName: severity colType: string countDistincts: 359 numNulls: 89 avgColLen:
100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
> 2019-03-23T01:35:16,467 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] exec.FilterOperator:
Setting stats (Num rows: 5 Data size: 12376 Basic stats: PARTIAL Column stats: NONE) on: FIL[8]
> 2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] exec.SelectOperator:
Setting stats (Num rows: 5 Data size: 12376 Basic stats: PARTIAL Column stats: NONE) on: SEL[2]
> 2019-03-23T01:35:16,468 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
[1] STATS-SEL[2]: numRows: 5 dataSize: 12376 basicStatsState: PARTIAL colStatsState: NONE
colStats: {severity= colName: severity colType: string countDistincts: 359 numNulls: 89 avgColLen:
100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true}
> 2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
STATS-GBY[3]: inputSize: 4443078 maxSplitSize: 256000000 parallelism: 1 containsGroupingSet:
false sizeOfGroupingSet: 1
> 2019-03-23T01:35:16,471 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
[Case 1] STATS-GBY[3]: cardinality: 5
> 2019-03-23T01:35:16,472 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] exec.GroupByOperator:
Setting stats (Num rows: 1 Data size: 9223372036854775807 Basic stats: PARTIAL Column stats:
NONE) on: GBY[3]
> 2019-03-23T01:35:16,472 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
[0] STATS-GBY[3]: numRows: 1 dataSize: 9223372036854775807 basicStatsState: PARTIAL colStatsState:
NONE colStats: {severity= colName: severity colType: string countDistincts: 1 numNulls: 18
avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true, _col0= colName:
_col0 colType: bigint countDistincts: 1 numNulls: 0 avgColLen: 8.0 numTrues: 0 numFalses:
0 isPrimaryKey: false isEstimated: false}
> 2019-03-23T01:35:16,473 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] exec.ReduceSinkOperator:
Setting stats (Num rows: 1 Data size: 9223372036854775807 Basic stats: PARTIAL Column stats:
NONE) on: RS[4]
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
[0] STATS-RS[4]: numRows: 1 dataSize: 9223372036854775807 basicStatsState: PARTIAL colStatsState:
NONE colStats: {severity= colName: severity colType: string countDistincts: 1 numNulls: 18
avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true, _col0= colName:
_col0 colType: bigint countDistincts: 1 numNulls: 0 avgColLen: 8.0 numTrues: 0 numFalses:
0 isPrimaryKey: false isEstimated: false}
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
STATS-GBY[5]: inputSize: 1 maxSplitSize: 256000000 parallelism: 1 containsGroupingSet: false
sizeOfGroupingSet: 1
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
[Case 7] STATS-GBY[5]: cardinality: 0
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] stats.StatsUtils:
STATS-GBY[5]: Equals 0 in number of rows. 0 rows will be set to 1
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] exec.GroupByOperator:
Setting stats (Num rows: 1 Data size: 9223372036854775807 Basic stats: PARTIAL Column stats:
NONE) on: GBY[5]
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
[0] STATS-GBY[5]: numRows: 1 dataSize: 9223372036854775807 basicStatsState: PARTIAL colStatsState:
NONE colStats: {severity= colName: severity colType: string countDistincts: 1 numNulls: 18
avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true, _col0= colName:
_col0 colType: bigint countDistincts: 1 numNulls: 0 avgColLen: 8.0 numTrues: 0 numFalses:
0 isPrimaryKey: false isEstimated: false}
> 2019-03-23T01:35:16,474 DEBUG [c779e956-b3b9-451a-8248-6ae7c669854f main] annotation.StatsRulesProcFactory:
[0] STATS-FS[7]: numRows: 1 dataSize: 9223372036854775807 basicStatsState: PARTIAL colStatsState:
NONE colStats: {severity= colName: severity colType: string countDistincts: 1 numNulls: 36
avgColLen: 100.0 numTrues: 0 numFalses: 0 isPrimaryKey: false isEstimated: true, _col0= colName:
_col0 colType: bigint countDistincts: 1 numNulls: 0 avgColLen: 8.0 numTrues: 0 numFalses:
0 isPrimaryKey: false isEstimated: false}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message