spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Chang <>
Subject Re: Using Spark on Data size larger than Memory size
Date Wed, 11 Jun 2014 01:42:32 GMT
Thanks for the clarification.

What is the proper way to configure RDDs when your aggregate data size
exceeds your available working memory size? In particular, in additional to
typical operations, I'm performing cogroups, joins, and coalesces/shuffles.

I see that the default storage level for RDDs is MEMORY_ONLY. Do I just need
to set all the storage level for all of my RDDs to something like
MEMORY_AND_DISK? Do I need to do anything else to get graceful behavior in
the presence of coalesces/shuffles, cogroups, and joins?


View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message