spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <>
Subject Re: How can I force operations to complete and spool to disk
Date Thu, 07 May 2015 10:09:28 GMT
I give the executor 14gb and would like to cut it.
I expect the critical operations to run hundreds of millions of times which
is why we run on a cluster. I will try DISK_ONLY_SER

Steven Lewis sent from my phone
On May 7, 2015 10:59 AM, "ayan guha" <> wrote:

> 2*2 cents
> 1. You can try repartition and give a large number to achieve smaller
> partitions.
> 2. OOM errors can be avoided by increasing executor memory or using off
> heap storage
> 3. How are you persisting? You can try using persist using DISK_ONLY_SER
> storage level
> 4. You may take a look in the algorithm once more. "Tasks typically
> preform both operations several hundred thousand times." why it can not be
> done distributed way?
> On Thu, May 7, 2015 at 3:16 PM, Steve Lewis <> wrote:
>> I am performing a job where I perform a number of steps in succession.
>> One step is a map on a JavaRDD which generates objects taking up
>> significant memory.
>> The this is followed by a join and an aggregateByKey.
>> The problem is that the system is running getting OutOfMemoryErrors -
>> Most tasks work but a few fail. Tasks typically preform both operations
>> several hundred thousand times.
>> I am convinced things would work if the map ran to completion and
>> shuffled results to disk before starting the aggregateByKey.
>> I tried calling persist and then count on the results of the map to force
>> execution but this does not seem to help. Smaller partitions might also
>> help if these could be forced.
>> Any ideas?
> --
> Best Regards,
> Ayan Guha

View raw message