spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kaklakariada <>
Subject groupByKey() and keys with many values
Date Mon, 07 Sep 2015 08:02:18 GMT

I already posted this question on the users mailing list
but did not get a reply. Maybe this is the correct forum to ask.

My problem is, that doing groupByKey().mapToPair() loads all values for a
key into memory which is a problem when the values don't fit into memory.
This was not a problem with Hadoop map/reduce, as the Iterable passed to the
reducer read from disk.

In Spark, the Iterable passed to mapToPair() is backed by a CompactBuffer
containing all values.

Is it possible to change this behavior without modifying Spark, or is there
a plan to change this?

Thank you very much for your help!

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message