spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: merge 3 different types of RDDs in one
Date Tue, 01 Dec 2015 10:04:56 GMT
I think you should be able to join different  rdds with same key. Have you
tried that?
On Dec 1, 2015 3:30 PM, "Praveen Chundi" <mail.chundi@gmail.com> wrote:

> cogroup could be useful to you, since all three are PairRDD's.
>
>
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
>
> Best Regards,
> Praveen
>
>
> On 01.12.2015 10:47, Shams ul Haque wrote:
>
>> Hi All,
>>
>> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
>> CustomerID in which 2 RDDs have value of Iterable type and one has signle
>> bean. All RDDs have id of Long type as CustomerId. Below are the model for
>> 3 RDDs:
>> JavaPairRDD<Long, Iterable<TransactionInfo>>
>> JavaPairRDD<Long, Iterable<TransactionRaw>>
>> JavaPairRDD<Long, TransactionAggr>
>>
>> Now, i have to merge all these 3 RDDs as signle one so that i can
>> generate excel report as per each customer by using data in 3 RDDs.
>> As i tried to using Join Transformation but it needs RDDs of same type
>> and it works for two RDDs.
>> So my questions is,
>> 1. is there any way to done my merging task efficiently, so that i can
>> get all 3 dataset by CustomerId?
>> 2. If i merge 1st two using Join Transformation, then do i need to run
>> groupByKey() before Join so that all data related to single customer will
>> be on one node?
>>
>>
>> Thanks
>> Shams
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message