spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "igor.berman" <>
Subject spilling in-memory map of 5.1 MB to disk (272 times so far)
Date Fri, 26 Jun 2015 17:07:04 GMT
wanted to get some advice regarding tunning spark application
I see for some of the tasks many log entries like this
Executor task launch worker-38 ExternalAppendOnlyMap: Thread 239 spilling
in-memory map of 5.1 MB to disk (272 times so far)
(especially when inputs are considerable)
I understand that this is connected to shuffle and joins, so that data is
spilled into disk to prevent OOM errors
what is the approach to handle this situation, I mean how can I "fix" this
situation - increase parallelism? add memory to the cluster? what else?
any ideas would be welcome

in general my app reads N key-value files and iteratevely fullOuterJoin-s
them(like folding by fullouter join). each key is user id and value is
aggregated statistics for this user represented by simple object. N files
are N days back. so to compute aggregation for today I can "combine" daily
thanks in advance,

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message