spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Jaggi <mohit.ja...@ayasdi.com>
Subject Re: trouble with closures
Date Mon, 04 Nov 2013 21:45:54 GMT
Thanks Jason. Yes, that was it.


On Fri, Nov 1, 2013 at 10:11 PM, Jason Lenderman <jslenderman@gmail.com>wrote:

> I suspect the problem might have to do with the
> serialization/deserialization of GenData. I'd try getting rid of the
> "extends App" and just writing a main and putting your code in there.
>
>
> On Fri, Nov 1, 2013 at 3:24 PM, Mohit Jaggi <mohit.jaggi@ayasdi.com>wrote:
>
>> Hi,
>> I wrote a small spark application to generate some random data. It works
>> fine if I use "local[n]" but when I use "mesos://..." the vals of outer
>> object that I am using in my function which is passed to RDD.foreach are
>> being set to zero.
>>
>> import java.io._
>>
>> import math.rint
>>
>> import org.apache.spark.SparkContext
>>
>> import org.apache.spark.SparkContext._
>>
>> object DataGen extends App {
>>
>>   val nClusters = 10
>>
>>   val nCols = 10000
>>
>>   val nRows = 10000
>>
>>   val rgen = new util.Random
>>
>>   System.setProperty("spark.executor.uri",
>> "hdfs://1b/spark/spark-0.8.0-incubating.tar.gz")
>>
>>   System.setProperty("spark.mesos.coarse", "true")
>>
>>   val sc = new SparkContext("mesos://10.0.1.128:5050", "Data Generator",
>>
>>     "/home/yuzr/spark/spark-0.8.0-incubating",
>>
>>     List("/home/yuzr/datagen/DataGen-assembly-0.1.jar"))
>>
>>
>>   val clusters = sc.parallelize(1 to nClusters)
>>
>>   val nRowsInCluster = nRows/nClusters
>>
>>   *println (**"nRowsInCluster=" + nRowsInCluster)  //---> prints 1000 in
>> spark driver*
>>
>>  * clusters foreach { x => writePart(x, nRowsInCluster) }*
>>
>> *  //clusters foreach writePart --> had this originally*
>>
>>   def writePart(nCluster: Int, nRowsInCluster: Int): Unit = {
>>
>>     val partFile = "/tmp/y" + nCluster + ".txt"
>>
>>     val partWriter = new java.io.PrintWriter(partFile)
>>
>>   ...
>>
>>    * println("Cluster #" + nCluster) --> prints 1 to 10*
>>
>> *    println ("nRowsInCluster=" + nRowsInCluster) --> prints 0 ??*
>>
>>   ...
>>
>>     }
>>
>>
>>     partWriter.close
>>
>>   }
>>
>> }
>>
>>
>> What am I doing wrong?
>>
>> Mohit.
>>
>
>

Mime
View raw message