spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xia, Junluan" <>
Subject RE: SPARK-942
Date Wed, 13 Nov 2013 01:59:27 GMT
Hi kely 

I also build a patch for this issue, and pass the test, you could help me to review if you
are free.

-----Original Message-----
From: Kyle Ellrott [] 
Sent: Wednesday, November 13, 2013 8:44 AM
Subject: Re: SPARK-942

I've posted a patch that I think produces the correct behavior at

It works fine on my programs, but if I run the unit tests, I get errors

[info] - large number of iterations *** FAILED ***
[info]   org.apache.spark.SparkException: Job aborted: Task 4.0:0 failed
more than 0 times; aborting job java.lang.ClassCastException:
scala.collection.immutable.StreamIterator cannot be cast to scala.collection.mutable.ArrayBuffer
[info]   at
[info]   at
[info]   at
[info]   at
[info]   at
[info]   at
[info]   at
[info]   at

I can't figure out the line number of where the original error occurred. Or why I can't replicate
them in my various test programs.
Any help would be appreciated.


On Tue, Nov 12, 2013 at 11:35 AM, Alex Boisvert <>wrote:

> On Tue, Nov 12, 2013 at 11:07 AM, Stephen Haberman < 
>> wrote:
> > Huge disclaimer that this is probably a big pita to implement, and 
> > could likely not be as worthwhile as I naively think it would be.
> >
> My perspective on this is it's already big pita of Spark users today.
> In the absence of explicit directions/hints, Spark should be able to 
> make ballpark estimates and conservatively pick # of partitions, 
> storage strategies (e.g., memory vs disk) and other runtime parameters that fit the
> deployment architecture/capacities.   If this requires code and extra
> runtime resources for sampling/measuring data, guestimating job size, 
> and so on, so be it.
> Users want working jobs first.  Optimal performance / resource 
> utilization follow from that.

View raw message