spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Jaggi <mohitja...@gmail.com>
Subject Re: ExternalAppendOnlyMap: Spilling in-memory map
Date Thu, 22 May 2014 21:02:02 GMT
Andrew,
I did not register anything explicitly based on the belief that the class
name is written out in full only once. I also wondered why that problem
would be specific to JodaTime and not show up with Java.util.date...I guess
it is possible based on internals of Joda time.
If I remove DateTime from my RDD, the problem goes away.
I will try explicit registration(and add DateTime back to my RDD) and see
if that makes things better.

Mohit.




On Wed, May 21, 2014 at 8:36 PM, Andrew Ash <andrew@andrewash.com> wrote:

> Hi Mohit,
>
> The log line about the ExternalAppendOnlyMap is more of a symptom of
> slowness than causing slowness itself.  The ExternalAppendOnlyMap is used
> when a shuffle is causing too much data to be held in memory.  Rather than
> OOM'ing, Spark writes the data out to disk in a sorted order and reads it
> back from disk later on when it's needed.  That's the job of the
> ExternalAppendOnlyMap.
>
> I wouldn't normally expect a conversion from Date to a Joda DateTime to
> take significantly more memory.  But since you're using Kryo and classes
> should be registered with it, may may have forgotten to register DateTime
> with Kryo.  If you don't register a class, it writes the class name at the
> beginning of every serialized instance, which for DateTime objects of size
> roughly 1 long, that's a ton of extra space and very inefficient.
>
> Can you confirm that DateTime is registered with Kryo?
>
> http://spark.apache.org/docs/latest/tuning.html#data-serialization
>
>
> On Wed, May 21, 2014 at 2:35 PM, Mohit Jaggi <mohitjaggi@gmail.com> wrote:
>
>> Hi,
>>
>> I changed my application to use Joda time instead of java.util.Date and I
>> started getting this:
>>
>> WARN ExternalAppendOnlyMap: Spilling in-memory map of 484 MB to disk (1
>> time so far)
>>
>> What does this mean? How can I fix this? Due to this a small job takes
>> forever.
>>
>> Mohit.
>>
>>
>> P.S.: I am using kyro serialization, have played around with several
>> values of sparkRddMemFraction
>>
>
>

Mime
View raw message