spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yinxusen <yinxu...@gmail.com>
Subject Re: wholeTextFiles not working with HDFS
Date Fri, 13 Jun 2014 02:58:19 GMT
Hi Sguj,

Could you give me the exception stack?

I test it on my laptop and find that it gets the wrong FileSystem. It should
be DistributedFileSystem, but it finds the RawLocalFileSystem.

If we get the same exception stack, I'll try to fix it.

Here is my exception stack:

java.io.FileNotFoundException: File /sen/reuters-out/reut2-000.sgm-0.txt
does not exist.
        at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
        at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
        at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:489)
        at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:280)
        at
org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:240)
        at
org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:173)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
        at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:201)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:728)

Besides, what's your hadoop version?




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/wholeTextFiles-not-working-with-HDFS-tp7490p7548.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message