spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: ExternalAppendOnlyMap throw no such element
Date Sun, 26 Jan 2014 22:16:43 GMT
Hey There,

So one thing you can do is disable the external sorting, this should
preserve the behavior exactly was it was in previous releases.

It's quite possible that the problem you are having relates to the
fact that you have individual records that are 1GB in size. This is a
pretty extreme case that may violate assumptions in the implementation
of the external aggregation code.

Would you mind opening a Jira for this? Also, if you are able to find
an isolated way to recreate the behavior it will make it easier to
debug and fix.

IIRC, even with external aggregation Spark still materializes the
final combined output *for a given key* in memory. If you are
outputting GB of data for a single key, then you might also look into
a different parallelization strategy for your algorithm. Not sure if
this is also an issue though...

- Patrick

On Sun, Jan 26, 2014 at 2:27 AM, guojc <guojc03@gmail.com> wrote:
> Hi Patrick,
>     I still get the exception on lastest master
> 05be7047744c88e64e7e6bd973f9bcfacd00da5f. A bit more info on the subject.
> I'm using KryoSerialzation with a custom serialization function, and the
> exception come from a rdd operation
> combineByKey(createDict,combineKey,mergeDict,partitioner,true,"org.apache.spark.serializer.KryoSerializer").
> All previous operation seems ok. The only difference is that this operation
> can generate some a large dict object around 1 gb size.  I hope this can
> give you some clue what might go wrong.  I'm still having trouble figure out
> the cause.
>
> Thanks,
> Jiacheng Guo
>
>
> On Wed, Jan 22, 2014 at 1:36 PM, Patrick Wendell <pwendell@gmail.com> wrote:
>>
>> This code has been modified since you reported this so you may want to
>> try the current master.
>>
>> - Patrick
>>
>> On Mon, Jan 20, 2014 at 4:22 AM, guojc <guojc03@gmail.com> wrote:
>> > Hi,
>> >   I'm tring out lastest master branch of spark for the exciting external
>> > hashmap feature. I have a code that is running correctly at spark 0.8.1
>> > and
>> > I only make a change for its easily to be spilled to disk. However, I
>> > encounter a few task failure of
>> > java.util.NoSuchElementException (java.util.NoSuchElementException)
>> >
>> > org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
>> > And the job seems to fail to recover.
>> > Can anyone give some suggestion on how to investigate the issue?
>> > Thanks,Jiacheng Guo
>
>

Mime
View raw message