spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Spark shell and StackOverFlowError
Date Sun, 30 Aug 2015 16:56:13 GMT
I got StackOverFlowError as well :-(

On Sun, Aug 30, 2015 at 9:47 AM, Ashish Shrowty <ashish.shrowty@gmail.com>
wrote:

> Yep .. I tried that too earlier. Doesn't make a difference. Are you able
> to replicate on your side?
>
>
> On Sun, Aug 30, 2015 at 12:08 PM Ted Yu <yuzhihong@gmail.com> wrote:
>
>> I see.
>>
>> What about using the following in place of variable a ?
>>
>> http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
>>
>> Cheers
>>
>> On Sun, Aug 30, 2015 at 8:54 AM, Ashish Shrowty <ashish.shrowty@gmail.com
>> > wrote:
>>
>>> @Sean - Agree that there is no action, but I still get the
>>> stackoverflowerror, its very weird
>>>
>>> @Ted - Variable a is just an int - val a = 10 ... The error happens
>>> when I try to pass a variable into the closure. The example you have above
>>> works fine since there is no variable being passed into the closure from
>>> the shell.
>>>
>>> -Ashish
>>>
>>> On Sun, Aug 30, 2015 at 9:55 AM Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>>> Using Spark shell :
>>>>
>>>> scala> import scala.collection.mutable.MutableList
>>>> import scala.collection.mutable.MutableList
>>>>
>>>> scala> val lst = MutableList[(String,String,Double)]()
>>>> lst: scala.collection.mutable.MutableList[(String, String, Double)] =
>>>> MutableList()
>>>>
>>>> scala> Range(0,10000).foreach(i=>lst+=(("10","10",i:Double)))
>>>>
>>>> scala> val rdd=sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
>>>> <console>:27: error: not found: value a
>>>>        val rdd=sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
>>>>                                           ^
>>>>
>>>> scala> val rdd=sc.makeRDD(lst).map(i=> if(i._1==10) 1 else 0)
>>>> rdd: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at
>>>> <console>:27
>>>>
>>>> scala> rdd.count()
>>>> ...
>>>> 15/08/30 06:53:40 INFO DAGScheduler: Job 0 finished: count at
>>>> <console>:30, took 0.478350 s
>>>> res1: Long = 10000
>>>>
>>>> Ashish:
>>>> Please refine your example to mimic more closely what your code
>>>> actually did.
>>>>
>>>> Thanks
>>>>
>>>> On Sun, Aug 30, 2015 at 12:24 AM, Sean Owen <sowen@cloudera.com> wrote:
>>>>
>>>>> That can't cause any error, since there is no action in your first
>>>>> snippet. Even calling count on the result doesn't cause an error. You
>>>>> must be executing something different.
>>>>>
>>>>> On Sun, Aug 30, 2015 at 4:21 AM, ashrowty <ashish.shrowty@gmail.com>
>>>>> wrote:
>>>>> > I am running the Spark shell (1.2.1) in local mode and I have a
>>>>> simple
>>>>> > RDD[(String,String,Double)] with about 10,000 objects in it. I get
a
>>>>> > StackOverFlowError each time I try to run the following code (the
>>>>> code
>>>>> > itself is just representative of other logic where I need to pass
in
>>>>> a
>>>>> > variable). I tried broadcasting the variable too, but no luck ..
>>>>> missing
>>>>> > something basic here -
>>>>> >
>>>>> > val rdd = sc.makeRDD(List(<Data read from file>)
>>>>> > val a=10
>>>>> > rdd.map(r => if (a==10) 1 else 0)
>>>>> > This throws -
>>>>> >
>>>>> > java.lang.StackOverflowError
>>>>> >     at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:318)
>>>>> >     at
>>>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1133)
>>>>> >     at
>>>>> >
>>>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>>>>> >     at
>>>>> >
>>>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>>>>> >     at
>>>>> >
>>>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>>>>> >     at
>>>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>>>>> >     at
>>>>> >
>>>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>>>>> >     at
>>>>> >
>>>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>>>>> >     at
>>>>> >
>>>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>>>>> > ...
>>>>> > ...
>>>>> >
>>>>> > More experiments  .. this works -
>>>>> >
>>>>> > val lst = Range(0,10000).map(i=>("10","10",i:Double)).toList
>>>>> > sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
>>>>> >
>>>>> > But below doesn't and throws the StackoverflowError -
>>>>> >
>>>>> > val lst = MutableList[(String,String,Double)]()
>>>>> > Range(0,10000).foreach(i=>lst+=(("10","10",i:Double)))
>>>>> > sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
>>>>> >
>>>>> > Any help appreciated!
>>>>> >
>>>>> > Thanks,
>>>>> > Ashish
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-shell-and-StackOverFlowError-tp24508.html
>>>>> > Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>> >
>>>>> > ---------------------------------------------------------------------
>>>>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> > For additional commands, e-mail: user-help@spark.apache.org
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>>
>>>>
>>

Mime
View raw message