hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
Date Thu, 05 Nov 2015 00:33:27 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990788#comment-14990788
] 

Xuefu Zhang commented on HIVE-12229:
------------------------------------

+1 for the patch. I have a followup question though. Based on your previous comment, what
happens if spark.files.overwrite is false? Will user see an error? User should be able to
set this property to true at Hive (say, beeline), right?

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-12229
>                 URL: https://issues.apache.org/jira/browse/HIVE-12229
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Lifeng Wang
>            Assignee: Rui Li
>         Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, HIVE-12229.3-spark.patch,
HIVE-12229.3-spark.patch
>
>
> Added one python script in the query and the python script cannot be found during execution
in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, q2-sessionize.py,
3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all input data.
This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set hive.exec.script.allow.partial.consumption=true;
to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: org.apache.hadoop.hive.ql.metadata.HiveException:
Error while processing row (tag=0) {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row (tag=0)
{"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
>         at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
>         at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
>         at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>         at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>         at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>         at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
>         at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: An error
occurred while reading or writing to your custom script. It may have crashed with an error.
>         at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>         at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>         at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
>         ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message