spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Haberman <>
Subject Re: oome from blockmanager
Date Sat, 26 Oct 2013 20:13:47 GMT
Hi Patrick,

> Just wondering, how many reducers are you using in this shuffle? By
> 7,000 partitions, I'm assuming you mean the map side of the shuffle.
> What about the reduce side?

7,000 on that side as well.

We're loading about a month's worth of data in one RDD, with ~7,000
partitions, and cogrouping it with another RDD with 50 partitions, and
the resulting RDD also has 7,000 partitions.

(As, since we don't have spark.default.parallelism set, the
defaultPartitioner logic chooses the max of [50, 7,0000] to be the next
partition size.)

I believe that is what you're asking by number of reducers? The number
of partitions in the post-cogroup ShuffledRDD?

Also, AFAICT, I don't believe we get to the reduce/ShuffledRDD side of
this cogroup--after 2,000-3,000 ShuffleMapTasks on the map side is when
it bogs down.


View raw message