spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <philip.og...@oracle.com>
Subject Re: rdd.saveAsTextFile problem
Date Thu, 02 Jan 2014 18:31:37 GMT
Yep - that works great and is what I normally do.

I perhaps should have framed my email as a bug report.  The 
documentation for saveAsTextFile says you can write results out to a 
local file but it doesn't work for me per the described behavior. It 
also worked before and now it doesn't.  So, it seems like a bug. Should 
I file a Jira issue?  I haven't done that yet for this project but would 
be happy to.

Thanks,
Philip

On 1/2/2014 11:23 AM, Andrew Ash wrote:
> For testing, maybe try using .collect and doing the comparison between 
> expected and actual in memory rather than on disk?
>
>
> On Thu, Jan 2, 2014 at 12:54 PM, Philip Ogren <philip.ogren@oracle.com 
> <mailto:philip.ogren@oracle.com>> wrote:
>
>     I just tried your suggestion and get the same results with the
>     _temporary directory.  Thanks though.
>
>
>     On 1/2/2014 10:28 AM, Andrew Ash wrote:
>>     You want to write it to a local file on the machine?  Try using
>>     "file:///path/to/target/mydir/" instead
>>
>>     I'm not sure what behavior would be if you did this on a
>>     multi-machine cluster though -- you may get a bit of data on each
>>     machine in that local directory.
>>
>>
>>     On Thu, Jan 2, 2014 at 12:22 PM, Philip Ogren
>>     <philip.ogren@oracle.com <mailto:philip.ogren@oracle.com>> wrote:
>>
>>         I have a very simple Spark application that looks like the
>>         following:
>>
>>
>>         var myRdd: RDD[Array[String]] = initMyRdd()
>>         println(myRdd.first.mkString(", "))
>>         println(myRdd.count)
>>
>>         myRdd.saveAsTextFile("hdfs://myserver:8020/mydir")
>>         myRdd.saveAsTextFile("target/mydir/")
>>
>>
>>         The println statements work as expected.  The first
>>         saveAsTextFile statement also works as expected.  The second
>>         saveAsTextFile statement does not (even if the first is
>>         commented out.)  I get the exception pasted below.  If I
>>         inspect "target/mydir" I see that there is a directory called
>>         _temporary/0/_temporary/attempt_201401020953_0000_m_000000_1
>>         which contains an empty part-00000 file.  It's curious
>>         because this code worked before with Spark 0.8.0 and now I am
>>         running on Spark 0.8.1. I happen to be running this on
>>         Windows in "local" mode at the moment.  Perhaps I should try
>>         running it on my linux box.
>>
>>         Thanks,
>>         Philip
>>
>>
>>         Exception in thread "main" org.apache.spark.SparkException:
>>         Job aborted: Task 2.0:0 failed more than 0 times; aborting
>>         job java.lang.NullPointerException
>>             at
>>         org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>>             at
>>         org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>>             at
>>         scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>>             at
>>         scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>             at
>>         org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>>             at
>>         org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>>             at org.apache.spark.scheduler.DAGScheduler.org
>>         <http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>>             at
>>         org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>>
>>
>>
>
>


Mime
View raw message