spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian O'Connell" <...@ianoconnell.com>
Subject Re: Spark variable init problem
Date Wed, 07 Aug 2013 15:54:01 GMT
The closure cleaner doesn't try serialize objects which are part of an
object. Its not the app trait so much as the object itself,  I just don't
think they thought this would be a common use case. There might be an
argument that being part of an object they should be expected to be
supplied to on the worker node, but then you would need a full path rather
than rely on the closure.


On Wed, Aug 7, 2013 at 8:45 AM, Han JU <ju.han.felix@gmail.com> wrote:

> Thanks!
> I changed it to an object with explicit main function and that works ...
> But what's the problem behind this? Extending App trait prevents Spark of
> copying the defined values outside of the closure?
>
>
> 2013/8/7 Ian O'Connell <ian@ianoconnell.com>
>
>> do you have any function inside the object?
>>
>> following a code layout like...
>> https://github.com/mesos/spark/blob/master/examples/src/main/scala/spark/examples/SparkLR.scala?
>>
>>
>> On Wed, Aug 7, 2013 at 8:17 AM, Han JU <ju.han.felix@gmail.com> wrote:
>>
>>> Thanks first.
>>>
>>> It's a scala Object extending App.
>>>
>>>
>>> 2013/8/7 Ian O'Connell <ian@ianoconnell.com>
>>>
>>>> is your code is probably part of an object? the closure cleaner doesn't
>>>> attempt to pull in parts of objects.
>>>>
>>>> What does the code around your sketched section look like?
>>>>
>>>>
>>>> On Wed, Aug 7, 2013 at 6:47 AM, Han JU <ju.han.felix@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm just putting my hands on Spark and I wrote a simple job in scala.
>>>>> It sketches like:
>>>>>
>>>>> val TAB = "\t"
>>>>>
>>>>> val support = 2
>>>>>
>>>>> val sc = new SparkContext(...)
>>>>>
>>>>> val raw = sc.textFile(...)
>>>>>
>>>>> val filtered = raw.map(
>>>>>
>>>>>   line => {
>>>>>
>>>>>     val lineSplit = line.split(TAB) // TAB is null and exception is
>>>>> thrown during the run
>>>>>     ...
>>>>>
>>>>>   }).filter( p => p._2 >= support) // support here is 0 during
the run
>>>>>
>>>>> ...
>>>>>
>>>>> I run the sbt-assembly jar like "java -cp ..." on a standalone
>>>>> cluster, I found out that when referenced in the RDD transformation,
the 2
>>>>> values, TAB and support, are set to their default values. So TAB is null,
>>>>> and support is 0 and no longer "\t" and 2 as they are initialized above.
>>>>>
>>>>> If the same jar is run locally (MASTER is local or local[k] instead of
>>>>> spark://...) on the same input, it runs perfectly. The code also runs
well
>>>>> in spark-shell on cluster.
>>>>>
>>>>> For the jar to run correctly on cluster, I have to hard code the
>>>>> string literal and the number in the RDD transformation part.
>>>>>
>>>>> It really seems to me a weird bug, maybe it has something to do with
>>>>> the sbt-assembly jar compilation? Some suggestions?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> I'm using spark version 0.7.3 and scala 2.9.3.
>>>>>
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

Mime
View raw message