spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Shrowty <ashish.shro...@gmail.com>
Subject Re: Spark shell and StackOverFlowError
Date Sun, 30 Aug 2015 18:26:51 GMT
Do you think I should create a JIRA?


On Sun, Aug 30, 2015 at 12:56 PM Ted Yu <yuzhihong@gmail.com> wrote:

> I got StackOverFlowError as well :-(
>
> On Sun, Aug 30, 2015 at 9:47 AM, Ashish Shrowty <ashish.shrowty@gmail.com>
> wrote:
>
>> Yep .. I tried that too earlier. Doesn't make a difference. Are you able
>> to replicate on your side?
>>
>>
>> On Sun, Aug 30, 2015 at 12:08 PM Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> I see.
>>>
>>> What about using the following in place of variable a ?
>>>
>>> http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
>>>
>>> Cheers
>>>
>>> On Sun, Aug 30, 2015 at 8:54 AM, Ashish Shrowty <
>>> ashish.shrowty@gmail.com> wrote:
>>>
>>>> @Sean - Agree that there is no action, but I still get the
>>>> stackoverflowerror, its very weird
>>>>
>>>> @Ted - Variable a is just an int - val a = 10 ... The error happens
>>>> when I try to pass a variable into the closure. The example you have above
>>>> works fine since there is no variable being passed into the closure from
>>>> the shell.
>>>>
>>>> -Ashish
>>>>
>>>> On Sun, Aug 30, 2015 at 9:55 AM Ted Yu <yuzhihong@gmail.com> wrote:
>>>>
>>>>> Using Spark shell :
>>>>>
>>>>> scala> import scala.collection.mutable.MutableList
>>>>> import scala.collection.mutable.MutableList
>>>>>
>>>>> scala> val lst = MutableList[(String,String,Double)]()
>>>>> lst: scala.collection.mutable.MutableList[(String, String, Double)] =
>>>>> MutableList()
>>>>>
>>>>> scala> Range(0,10000).foreach(i=>lst+=(("10","10",i:Double)))
>>>>>
>>>>> scala> val rdd=sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
>>>>> <console>:27: error: not found: value a
>>>>>        val rdd=sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
>>>>>                                           ^
>>>>>
>>>>> scala> val rdd=sc.makeRDD(lst).map(i=> if(i._1==10) 1 else 0)
>>>>> rdd: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at map at
>>>>> <console>:27
>>>>>
>>>>> scala> rdd.count()
>>>>> ...
>>>>> 15/08/30 06:53:40 INFO DAGScheduler: Job 0 finished: count at
>>>>> <console>:30, took 0.478350 s
>>>>> res1: Long = 10000
>>>>>
>>>>> Ashish:
>>>>> Please refine your example to mimic more closely what your code
>>>>> actually did.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Sun, Aug 30, 2015 at 12:24 AM, Sean Owen <sowen@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> That can't cause any error, since there is no action in your first
>>>>>> snippet. Even calling count on the result doesn't cause an error.
You
>>>>>> must be executing something different.
>>>>>>
>>>>>> On Sun, Aug 30, 2015 at 4:21 AM, ashrowty <ashish.shrowty@gmail.com>
>>>>>> wrote:
>>>>>> > I am running the Spark shell (1.2.1) in local mode and I have
a
>>>>>> simple
>>>>>> > RDD[(String,String,Double)] with about 10,000 objects in it.
I get a
>>>>>> > StackOverFlowError each time I try to run the following code
(the
>>>>>> code
>>>>>> > itself is just representative of other logic where I need to
pass
>>>>>> in a
>>>>>> > variable). I tried broadcasting the variable too, but no luck
..
>>>>>> missing
>>>>>> > something basic here -
>>>>>> >
>>>>>> > val rdd = sc.makeRDD(List(<Data read from file>)
>>>>>> > val a=10
>>>>>> > rdd.map(r => if (a==10) 1 else 0)
>>>>>> > This throws -
>>>>>> >
>>>>>> > java.lang.StackOverflowError
>>>>>> >     at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:318)
>>>>>> >     at
>>>>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1133)
>>>>>> >     at
>>>>>> >
>>>>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>>>>>> >     at
>>>>>> >
>>>>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>>>>>> >     at
>>>>>> >
>>>>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>>>>>> >     at
>>>>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>>>>>> >     at
>>>>>> >
>>>>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>>>>>> >     at
>>>>>> >
>>>>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>>>>>> >     at
>>>>>> >
>>>>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>>>>>> > ...
>>>>>> > ...
>>>>>> >
>>>>>> > More experiments  .. this works -
>>>>>> >
>>>>>> > val lst = Range(0,10000).map(i=>("10","10",i:Double)).toList
>>>>>> > sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
>>>>>> >
>>>>>> > But below doesn't and throws the StackoverflowError -
>>>>>> >
>>>>>> > val lst = MutableList[(String,String,Double)]()
>>>>>> > Range(0,10000).foreach(i=>lst+=(("10","10",i:Double)))
>>>>>> > sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
>>>>>> >
>>>>>> > Any help appreciated!
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Ashish
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-shell-and-StackOverFlowError-tp24508.html
>>>>>> > Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>> >
>>>>>> >
>>>>>> ---------------------------------------------------------------------
>>>>>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> > For additional commands, e-mail: user-help@spark.apache.org
>>>>>> >
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>
>

Mime
View raw message