spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean McNamara <>
Subject Re: Large dataset, reduceByKey - java heap space error
Date Thu, 22 Jan 2015 22:38:59 GMT
Hi Kane- has excellent information that may be helpful.
 In particular increasing the number of tasks may help, as well as confirming that you don’t
have more data than you're expecting landing on a key.

Also, if you are using spark < 1.2.0,  setting spark.shuffle.manager=sort was a huge help
for many of our shuffle heavy workloads (this is the default in 1.2.0 now)



On Jan 22, 2015, at 3:15 PM, Kane Kim <<>>

I'm trying to process a large dataset, mapping/filtering works ok, but
as long as I try to reduceByKey, I get out of memory errors:

Any ideas how I can fix that?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message