spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Help in merging a RDD agaisnt itself using the V of a (K,V).
Date Thu, 24 Jul 2014 10:29:25 GMT
Yeah reduce() will leave you with one big collection of sets on the
driver. Maybe the set of all identifiers isn't so big -- a hundred
million Longs even isn't so much. I'm glad to hear cartesian works but
can that scale? you're making an RDD of N^2 elements initially which
is just vast.

On Thu, Jul 24, 2014 at 2:09 AM, Roch Denis <> wrote:
> Ah yes, you're quite right with partitions I could probably process a good
> chunk of the data but I didn't think a reduce would work? Sorry, I'm still
> new to Spark and map reduce in general but I thought that the reduce result
> wasn't an RDD and had to fit into memory. If the result of a reduce can be
> any size, then yes I can see how to make it work.
> Sorry for not being certain, the doc is not quite clear on that point, at
> least to me.
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at

View raw message