spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Shrowty <ashish.shro...@gmail.com>
Subject Re: Spark shell and StackOverFlowError
Date Sun, 30 Aug 2015 19:08:52 GMT
Sean .. does the code below work for you in the Spark shell? Ted got the
same error -

val a=10
val lst = MutableList[(String,String,Double)]()
Range(0,10000).foreach(i=>lst+=(("10","10",i:Double)))
sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)

-Ashish


On Sun, Aug 30, 2015 at 2:52 PM Sean Owen <sowen@cloudera.com> wrote:

> I'm not sure how to reproduce it? this code does not produce an error in
> master.
>
> On Sun, Aug 30, 2015 at 7:26 PM, Ashish Shrowty
> <ashish.shrowty@gmail.com> wrote:
> > Do you think I should create a JIRA?
> >
> >
> > On Sun, Aug 30, 2015 at 12:56 PM Ted Yu <yuzhihong@gmail.com> wrote:
> >>
> >> I got StackOverFlowError as well :-(
> >>
> >> On Sun, Aug 30, 2015 at 9:47 AM, Ashish Shrowty <
> ashish.shrowty@gmail.com>
> >> wrote:
> >>>
> >>> Yep .. I tried that too earlier. Doesn't make a difference. Are you
> able
> >>> to replicate on your side?
> >>>
> >>>
> >>> On Sun, Aug 30, 2015 at 12:08 PM Ted Yu <yuzhihong@gmail.com> wrote:
> >>>>
> >>>> I see.
> >>>>
> >>>> What about using the following in place of variable a ?
> >>>>
> >>>>
> http://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
> >>>>
> >>>> Cheers
> >>>>
> >>>> On Sun, Aug 30, 2015 at 8:54 AM, Ashish Shrowty
> >>>> <ashish.shrowty@gmail.com> wrote:
> >>>>>
> >>>>> @Sean - Agree that there is no action, but I still get the
> >>>>> stackoverflowerror, its very weird
> >>>>>
> >>>>> @Ted - Variable a is just an int - val a = 10 ... The error happens
> >>>>> when I try to pass a variable into the closure. The example you
have
> above
> >>>>> works fine since there is no variable being passed into the closure
> from the
> >>>>> shell.
> >>>>>
> >>>>> -Ashish
> >>>>>
> >>>>> On Sun, Aug 30, 2015 at 9:55 AM Ted Yu <yuzhihong@gmail.com>
wrote:
> >>>>>>
> >>>>>> Using Spark shell :
> >>>>>>
> >>>>>> scala> import scala.collection.mutable.MutableList
> >>>>>> import scala.collection.mutable.MutableList
> >>>>>>
> >>>>>> scala> val lst = MutableList[(String,String,Double)]()
> >>>>>> lst: scala.collection.mutable.MutableList[(String, String, Double)]
> =
> >>>>>> MutableList()
> >>>>>>
> >>>>>> scala> Range(0,10000).foreach(i=>lst+=(("10","10",i:Double)))
> >>>>>>
> >>>>>> scala> val rdd=sc.makeRDD(lst).map(i=> if(a==10) 1 else
0)
> >>>>>> <console>:27: error: not found: value a
> >>>>>>        val rdd=sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
> >>>>>>                                           ^
> >>>>>>
> >>>>>> scala> val rdd=sc.makeRDD(lst).map(i=> if(i._1==10) 1
else 0)
> >>>>>> rdd: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[1] at
map at
> >>>>>> <console>:27
> >>>>>>
> >>>>>> scala> rdd.count()
> >>>>>> ...
> >>>>>> 15/08/30 06:53:40 INFO DAGScheduler: Job 0 finished: count at
> >>>>>> <console>:30, took 0.478350 s
> >>>>>> res1: Long = 10000
> >>>>>>
> >>>>>> Ashish:
> >>>>>> Please refine your example to mimic more closely what your code
> >>>>>> actually did.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> On Sun, Aug 30, 2015 at 12:24 AM, Sean Owen <sowen@cloudera.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> That can't cause any error, since there is no action in
your first
> >>>>>>> snippet. Even calling count on the result doesn't cause
an error.
> You
> >>>>>>> must be executing something different.
> >>>>>>>
> >>>>>>> On Sun, Aug 30, 2015 at 4:21 AM, ashrowty <
> ashish.shrowty@gmail.com>
> >>>>>>> wrote:
> >>>>>>> > I am running the Spark shell (1.2.1) in local mode
and I have a
> >>>>>>> > simple
> >>>>>>> > RDD[(String,String,Double)] with about 10,000 objects
in it. I
> get
> >>>>>>> > a
> >>>>>>> > StackOverFlowError each time I try to run the following
code (the
> >>>>>>> > code
> >>>>>>> > itself is just representative of other logic where
I need to pass
> >>>>>>> > in a
> >>>>>>> > variable). I tried broadcasting the variable too, but
no luck ..
> >>>>>>> > missing
> >>>>>>> > something basic here -
> >>>>>>> >
> >>>>>>> > val rdd = sc.makeRDD(List(<Data read from file>)
> >>>>>>> > val a=10
> >>>>>>> > rdd.map(r => if (a==10) 1 else 0)
> >>>>>>> > This throws -
> >>>>>>> >
> >>>>>>> > java.lang.StackOverflowError
> >>>>>>> >     at
> java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:318)
> >>>>>>> >     at
> >>>>>>> >
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1133)
> >>>>>>> >     at
> >>>>>>> >
> >>>>>>> >
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> >>>>>>> >     at
> >>>>>>> >
> >>>>>>> >
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> >>>>>>> >     at
> >>>>>>> >
> >>>>>>> >
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> >>>>>>> >     at
> >>>>>>> >
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> >>>>>>> >     at
> >>>>>>> >
> >>>>>>> >
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> >>>>>>> >     at
> >>>>>>> >
> >>>>>>> >
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> >>>>>>> >     at
> >>>>>>> >
> >>>>>>> >
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> >>>>>>> > ...
> >>>>>>> > ...
> >>>>>>> >
> >>>>>>> > More experiments  .. this works -
> >>>>>>> >
> >>>>>>> > val lst = Range(0,10000).map(i=>("10","10",i:Double)).toList
> >>>>>>> > sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
> >>>>>>> >
> >>>>>>> > But below doesn't and throws the StackoverflowError
-
> >>>>>>> >
> >>>>>>> > val lst = MutableList[(String,String,Double)]()
> >>>>>>> > Range(0,10000).foreach(i=>lst+=(("10","10",i:Double)))
> >>>>>>> > sc.makeRDD(lst).map(i=> if(a==10) 1 else 0)
> >>>>>>> >
> >>>>>>> > Any help appreciated!
> >>>>>>> >
> >>>>>>> > Thanks,
> >>>>>>> > Ashish
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> >>>>>>> > --
> >>>>>>> > View this message in context:
> >>>>>>> >
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-shell-and-StackOverFlowError-tp24508.html
> >>>>>>> > Sent from the Apache Spark User List mailing list archive
at
> >>>>>>> > Nabble.com.
> >>>>>>> >
> >>>>>>> >
> >>>>>>> >
> ---------------------------------------------------------------------
> >>>>>>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >>>>>>> > For additional commands, e-mail: user-help@spark.apache.org
> >>>>>>> >
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >>>>>>> For additional commands, e-mail: user-help@spark.apache.org
> >>>>>>>
> >>>>>>
> >>>>
> >>
> >
>

Mime
View raw message