spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: rdd.saveAsTextFile problem
Date Thu, 02 Jan 2014 17:28:35 GMT
You want to write it to a local file on the machine?  Try using
"file:///path/to/target/mydir/" instead

I'm not sure what behavior would be if you did this on a multi-machine
cluster though -- you may get a bit of data on each machine in that local
directory.


On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren <philip.ogren@oracle.com>wrote:

> I have a very simple Spark application that looks like the following:
>
>
> var myRdd: RDD[Array[String]] = initMyRdd()
> println(myRdd.first.mkString(", "))
> println(myRdd.count)
>
> myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
> myRdd.saveAsTextFile("target/mydir/")
>
>
> The println statements work as expected.  The first saveAsTextFile
> statement also works as expected.  The second saveAsTextFile statement does
> not (even if the first is commented out.)  I get the exception pasted
> below.  If I inspect "target/mydir" I see that there is a directory called
> _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1 which
> contains an empty part-00000 file.  It's curious because this code worked
> before with Spark 0.8.0 and now I am running on Spark 0.8.1. I happen to be
> running this on Windows in "local" mode at the moment.  Perhaps I should
> try running it on my linux box.
>
> Thanks,
> Philip
>
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted:
> Task 2.0:0 failed more than 0 times; aborting job
> java.lang.NullPointerException
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:827)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> abortStage$1.apply(DAGScheduler.scala:825)
>     at scala.collection.mutable.ResizableArray$class.foreach(
> ResizableArray.scala:60)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>     at org.apache.spark.scheduler.DAGScheduler.abortStage(
> DAGScheduler.scala:825)
>     at org.apache.spark.scheduler.DAGScheduler.processEvent(
> DAGScheduler.scala:440)
>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
> scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>     at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(
> DAGScheduler.scala:157)
>
>
>

Mime
View raw message