spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Chundi <>
Subject Re: merge 3 different types of RDDs in one
Date Tue, 01 Dec 2015 09:59:49 GMT
cogroup could be useful to you, since all three are PairRDD's.

Best Regards,

On 01.12.2015 10:47, Shams ul Haque wrote:
> Hi All,
> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by 
> CustomerID in which 2 RDDs have value of Iterable type and one has 
> signle bean. All RDDs have id of Long type as CustomerId. Below are 
> the model for 3 RDDs:
> JavaPairRDD<Long, Iterable<TransactionInfo>>
> JavaPairRDD<Long, Iterable<TransactionRaw>>
> JavaPairRDD<Long, TransactionAggr>
> Now, i have to merge all these 3 RDDs as signle one so that i can 
> generate excel report as per each customer by using data in 3 RDDs.
> As i tried to using Join Transformation but it needs RDDs of same type 
> and it works for two RDDs.
> So my questions is,
> 1. is there any way to done my merging task efficiently, so that i can 
> get all 3 dataset by CustomerId?
> 2. If i merge 1st two using Join Transformation, then do i need to run 
> groupByKey() before Join so that all data related to single customer 
> will be on one node?
> Thanks
> Shams

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message