spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Orka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-19752) OrcGetSplits fails with 0 size files
Date Wed, 01 Mar 2017 12:28:45 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890069#comment-15890069
] 

Nick Orka commented on SPARK-19752:
-----------------------------------

I agree that spark uses many hadoop/hive libraries, but it uses it in different way. When
I run the query in hive there is no error at all, it just returns empty resultset. It looks
like that hive checks a file size before getting splits and spark doesn't care about it. 
I understand that I can send this to hive JIRA and wait until they fix the bug and after that
wait until spark will start using fixed version. But in hive it's not failing now, should
I gave up spark and start using hive?

> OrcGetSplits fails with 0 size files
> ------------------------------------
>
>                 Key: SPARK-19752
>                 URL: https://issues.apache.org/jira/browse/SPARK-19752
>             Project: Spark
>          Issue Type: Bug
>          Components: Input/Output
>    Affects Versions: 2.1.0
>            Reporter: Nick Orka
>
> There is a possibility that during some sql queries a partition may have a 0 size file
(empty file). Next time when I try to read from the file by sql query, I'm getting this error:
> 17/02/27 10:33:11 INFO PerfLogger: </PERFLOG method=OrcGetSplits start=1488191591570
end=1488191591599 duration=29 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl>
> 17/02/27 10:33:11 ERROR ApplicationMaster: User class threw exception: java.lang.reflect.InvocationTargetException
> java.lang.reflect.InvocationTargetException
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror1.jinvokeraw(JavaMirrors.scala:373)
> 	at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaMethodMirror.jinvoke(JavaMirrors.scala:339)
> 	at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror.apply(JavaMirrors.scala:355)
> 	at com.sessionm.Datapipeline$.main(Datapipeline.scala:200)
> 	at com.sessionm.Datapipeline.main(Datapipeline.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
> Caused by: java.lang.RuntimeException: serious problem
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
> 	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
> 	at scala.Option.getOrElse(Option.scala:121)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
> 	at scala.Option.getOrElse(Option.scala:121)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
> 	at scala.Option.getOrElse(Option.scala:121)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
> 	at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84)
> 	at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:84)
> 	at scala.collection.parallel.AugmentedIterableIterator$class.map2combiner(RemainsIterator.scala:115)
> 	at scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:62)
> 	at scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1054)
> 	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
> 	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
> 	at scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
> 	at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
> 	at scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1051)
> 	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:169)
> 	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
> 	at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
> 	at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
> 	at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560)
> 	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1010)
> 	... 35 more
> 17/02/27 10:33:11 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason:
User class threw exception: java.lang.reflect.InvocationTargetException)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message