spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt K <>
Subject spark single PROCESS_LOCAL task
Date Fri, 15 Jul 2016 17:57:02 GMT
Hi all,

I'm seeing some curious behavior which I have a hard time interpreting. I
have a job which does a "groupByKey" and results in 300 executors. 299 are
run in NODE_LOCAL mode. 1 executor is run in PROCESS_LOCAL mode.

The 1 executor that runs in PROCESS_LOCAL mode gets about 10x as much input
as the other executors. It dies with OOM, and the job fails.

Only working theory I have is that there's a single key which has a ton of
data tied to it. Even so, I can't explain why it's run in PROCESS_LOCAL
mode and not others.

Anyone has ideas?


View raw message