spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Willard <bradwill...@gmail.com>
Subject Repartition Memory Leak
Date Sun, 04 Jan 2015 18:23:52 GMT
I have a 10 node cluster with 600gb of ram. I'm loading a fairly large
dataset from json files. When I load the dataset it is about 200gb however
it only creates 60 partitions. I'm trying to repartition to 256 to increase
cpu utilization however when I do that it balloons in memory to way over 2x
the initial size killing nodes from memory failures.

https://s3.amazonaws.com/f.cl.ly/items/3k2n2n3j35273i2v1Y3t/Screen%20Shot%202015-01-04%20at%201.20.29%20PM.png

Is this a bug? How can I work around this.

Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Repartition-Memory-Leak-tp20965.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message