spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: Bug in Accumulators...
Date Fri, 07 Nov 2014 08:12:01 GMT
This may be due in part to Scala allocating an anonymous inner class in
order to execute the for loop. I would expect if you change it to a while
loop like

var i = 0
while (i < 10) {
  sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
  i += 1
}

then the problem may go away. I am not super familiar with the closure
cleaner, but I believe that we cannot prune beyond 1 layer of references,
so the extra class of nesting may be screwing something up. If this is the
case, then I would also expect replacing the accumulator with any other
reference to the enclosing scope (such as a broadcast variable) would have
the same result.

On Fri, Nov 7, 2014 at 12:03 AM, Shixiong Zhu <zsxwing@gmail.com> wrote:

> Could you provide all pieces of codes which can reproduce the bug? Here is
> my test code:
>
> import org.apache.spark._
> import org.apache.spark.SparkContext._
>
> object SimpleApp {
>
>   def main(args: Array[String]) {
>     val conf = new SparkConf().setAppName("SimpleApp")
>     val sc = new SparkContext(conf)
>
>     val accum = sc.accumulator(0)
>     for (i <- 1 to 10) {
>       sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
>     }
>     sc.stop()
>   }
> }
>
> It works fine both in client and cluster. Since this is a serialization
> bug, the outer class does matter. Could you provide it? Is there
> a SparkContext field in the outer class?
>
> Best Regards,
> Shixiong Zhu
>
> 2014-10-28 0:28 GMT+08:00 octavian.ganea <octavian.ganea@inf.ethz.ch>:
>
> I am also using spark 1.1.0 and I ran it on a cluster of nodes (it works
>> if I
>> run it in local mode! )
>>
>> If I put the accumulator inside the for loop, everything will work fine. I
>> guess the bug is that an accumulator can be applied to JUST one RDD.
>>
>> Still another undocumented 'feature' of Spark that no one from the people
>> who maintain Spark is willing to solve or at least to tell us about ...
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Bug-in-Accumulators-tp17263p17372.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message