hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-12552) Wrong number of reducer estimation causing job to fail
Date Tue, 01 Dec 2015 23:26:11 GMT

     [ https://issues.apache.org/jira/browse/HIVE-12552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rajesh Balamohan updated HIVE-12552:
------------------------------------
    Attachment: HIVE-12552.2.patch

Addressed review comments from [~hagleitn]

With 2.0f, it was generating 1009 tasks and most of them were not getting enough data; which
could have been handled with less tasks. Got around 11-13% improvement with less number of
tasks in llap mode (attached images show container mode for debugging purpose).  Haven't changed
bytes per reducer in my run, which could bring down the number of reduce tasks.



> Wrong number of reducer estimation causing job to fail
> ------------------------------------------------------
>
>                 Key: HIVE-12552
>                 URL: https://issues.apache.org/jira/browse/HIVE-12552
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>         Attachments: 6_plan.txt, HIVE-12552.1.patch, HIVE-12552.2.patch, With_max_partition_0.5_setting.png,
with_default_setting.png
>
>
> {noformat}
> ], TaskAttempt 3 failed, info=[Error: Failure while running task: attempt_1448429572030_1812_1_03_000029_3:java.lang.RuntimeException:
java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException:
Illegal partition for 01 6c 6f 61 6e 20 61 63 63 6f 75 6e 74 00 01 80 1f e1 d7 ff (-1)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:195)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:348)
> 	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
> 	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
> 	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
> 	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.IOException:
Illegal partition for 01 6c 6f 61 6e 20 61 63 63 6f 75 6e 74 00 01 80 1f e1 d7 ff (-1)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:341)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:186)
> 	... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Illegal
partition for 01 6c 6f 61 6e 20 61 63 63 6f 75 6e 74 00 01 80 1f e1 d7 ff (-1)
> 	at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:402)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:852)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.writeSingleRow(VectorGroupByOperator.java:904)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.access$400(VectorGroupByOperator.java:59)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.flush(VectorGroupByOperator.java:469)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeHashAggregate.close(VectorGroupByOperator.java:375)
> 	at org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.closeOp(VectorGroupByOperator.java:950)
> 	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:656)
> 	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:670)
> 	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:670)
> 	at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:318)
> 	... 15 more
> Caused by: java.io.IOException: Illegal partition for 01 6c 6f 61 6e 20 61 63 63 6f 75
6e 74 00 01 80 1f e1 d7 ff (-1)
> 	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.collect(PipelinedSorter.java:379)
> 	at org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.write(PipelinedSorter.java:357)
> 	at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput$1.write(OrderedPartitionedKVOutput.java:163)
> 	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor$TezKVOutputCollector.collect(TezProcessor.java:232)
> 	at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:538)
> 	at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:385)
> 	... 25 more
> ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:277, Vertex
vertex_1448429572030_1812_1_03 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]DAG did not
succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
> {noformat}
> Env: master branch.
> Map 1 <- Map 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 5 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> set hive.tez.max.partition.factor=0.5f;
> This causes "Reducer 3" to have 0 tasks, causing the job to fail after reducer 2. 
> Will attach the plan and screenshot shortly 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message