spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Chundi <mail.chu...@gmail.com>
Subject Re: merge 3 different types of RDDs in one
Date Tue, 01 Dec 2015 09:59:49 GMT
cogroup could be useful to you, since all three are PairRDD's.

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

Best Regards,
Praveen


On 01.12.2015 10:47, Shams ul Haque wrote:
> Hi All,
>
> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by 
> CustomerID in which 2 RDDs have value of Iterable type and one has 
> signle bean. All RDDs have id of Long type as CustomerId. Below are 
> the model for 3 RDDs:
> JavaPairRDD<Long, Iterable<TransactionInfo>>
> JavaPairRDD<Long, Iterable<TransactionRaw>>
> JavaPairRDD<Long, TransactionAggr>
>
> Now, i have to merge all these 3 RDDs as signle one so that i can 
> generate excel report as per each customer by using data in 3 RDDs.
> As i tried to using Join Transformation but it needs RDDs of same type 
> and it works for two RDDs.
> So my questions is,
> 1. is there any way to done my merging task efficiently, so that i can 
> get all 3 dataset by CustomerId?
> 2. If i merge 1st two using Join Transformation, then do i need to run 
> groupByKey() before Join so that all data related to single customer 
> will be on one node?
>
>
> Thanks
> Shams


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message