spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huangyu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-15044) spark-sql will throw "input path does not exist" exception if it handles a partition which exists in hive table, but the path is removed manually
Date Tue, 03 May 2016 02:49:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267978#comment-15267978
] 

huangyu commented on SPARK-15044:
---------------------------------

Maybe you can try in Spark 1.6.1, because when I tried in Spark 1.6.1 it throw the exception,
but in Spark 1.4.0 it did not.

> spark-sql will throw "input path does not exist" exception if it handles a partition
which exists in hive table, but the path is removed manually
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15044
>                 URL: https://issues.apache.org/jira/browse/SPARK-15044
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: huangyu
>
> spark-sql will throw "input path not exist" exception if it handles a partition which
exists in hive table, but the path is removed manually.The situation is as follows:
> 1) Create a table "test". "create table test (n string) partitioned by (p string)"
> 2) Load some data into partition(p='1')
> 3)Remove the path related to partition(p='1') of table test manually. "hadoop fs -rmr
..../warehouse/..../test/p=1"
> 4)Run spark sql, spark-sql -e "select n from test where p='1';"
> Then it throws exception:
> {code}
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: ...../test/p=1
>         at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
>         at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>         at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
>         at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>         at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>         at scala.collection.immutable.List.foreach(List.scala:318)
>         at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>         at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
> {code}
> The bug is in spark 1.6.1, if I use spark 1.4.0, It is OK
> I think spark-sql should ignore the path, just like hive or it dose in early versions,
rather than throw an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message