spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Willard <>
Subject Repartition Memory Leak
Date Sun, 04 Jan 2015 18:23:52 GMT
I have a 10 node cluster with 600gb of ram. I'm loading a fairly large
dataset from json files. When I load the dataset it is about 200gb however
it only creates 60 partitions. I'm trying to repartition to 256 to increase
cpu utilization however when I do that it balloons in memory to way over 2x
the initial size killing nodes from memory failures.

Is this a bug? How can I work around this.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message