spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Cheah <matthew.c.ch...@gmail.com>
Subject Limiting number of reducers performance implications
Date Sat, 29 Mar 2014 23:58:38 GMT
Hi everyone,

I'm using Spark on machines where I can't change the maximum number of open
files. As a result, I'm limiting the number of reducers to 500. I'm also
only using a single machine that has 32 cores and emulating a cluster by
running 4 worker daemons with 8 cores (maximum) each.

What I'm noticing is that as I allocate more cores to my Spark job, my job
is not being completed much more quickly, and, the CPUs aren't being
heavily utilized. When I attempt to use all 32 cores, monitoring via mpstat
is showing a decent chunk of CPU idleness (70%-80% idle).

My job consists primarily of reduceByKey and groupByKey operations, with
flatMaps, maps, and filters in between.

I was wondering: *what exactly does the number of reducers do?* Clearly, if
there are 32 cores, setting the number of reducers to 500 should be a
workload that keeps all 32 cores busy. Or am I misunderstanding the concept
of what a reducer is?

Thanks,

-Matt Ceah

Mime
View raw message