spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darin McBeath <>
Subject What should be the number of partitions after a union and a subtractByKey
Date Tue, 11 Nov 2014 16:40:58 GMT
Assume the following where both updatePairRDD and deletePairRDD are both HashPartitioned.  Before
the union, each one of these has 512 partitions.   The new created updateDeletePairRDD has
1024 partitions.  Is this the general/expected behavior for a union (the number of partitions
to double)?
JavaPairRDD<String,String> updateDeletePairRDD = updatePairRDD.union(deletePairRDD);
Then a similar question for subtractByKey.  In the example below, baselinePairRDD is HashPartitioned
(with 512 partitions).  We know from above that updateDeletePairRDD has 1024 partitions.
 The newly created workSubtractBaselinePairRDD has 512 partitions.  This makes sense because
we are only 'subtracting' records from the baselinePairRDD and one wouldn't think the number
of partitions would increase.  Is this the general/expected behavior for a subractByKey?

JavaPairRDD<String,String> workSubtractBaselinePairRDD = baselinePairRDD.subtractByKey(updateDeletePairRDD);

View raw message