spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "igor.berman" <igor.ber...@gmail.com>
Subject spilling in-memory map of 5.1 MB to disk (272 times so far)
Date Fri, 26 Jun 2015 17:07:04 GMT
Hi,
wanted to get some advice regarding tunning spark application
I see for some of the tasks many log entries like this
Executor task launch worker-38 ExternalAppendOnlyMap: Thread 239 spilling
in-memory map of 5.1 MB to disk (272 times so far)
(especially when inputs are considerable)
I understand that this is connected to shuffle and joins, so that data is
spilled into disk to prevent OOM errors
what is the approach to handle this situation, I mean how can I "fix" this
situation - increase parallelism? add memory to the cluster? what else?
any ideas would be welcome

in general my app reads N key-value files and iteratevely fullOuterJoin-s
them(like folding by fullouter join). each key is user id and value is
aggregated statistics for this user represented by simple object. N files
are N days back. so to compute aggregation for today I can "combine" daily
aggregations.
thanks in advance,
Igor



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spilling-in-memory-map-of-5-1-MB-to-disk-272-times-so-far-tp23509.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message