spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kaklakariada <>
Subject Re: groupByKey() and keys with many values
Date Tue, 08 Sep 2015 06:20:25 GMT
Hi Antonio!

Thank you very much for your answer!
You are right in that in my case the computation could be replaced by a
reduceByKey. The thing is that my computation also involves database

1. Fetch key-specific data from database into memory. This is expensive and
I only want to do this once for a key.
2. Process each value using this data and update the common data
3. Store modified data to database. Here it is important to write all data
for a key in one go.

Is there a pattern how to implement something like this with reduceByKey?

Out of curiosity: I understand why you want to discourage people from using
groupByKey. But is there a technical reason why the Iterable is implemented
the way it is?

Kind regards,

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message