spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <>
Subject Re: merge 3 different types of RDDs in one
Date Tue, 01 Dec 2015 09:53:21 GMT

Never done it before, but just yesterday I found out about
SparkContext.union method that could help in your case.

def union[T](rdds: Seq[RDD[T]])(implicit arg0: ClassTag[T]): RDD[T]


Jacek Laskowski | |
Mastering Spark
Follow me at
Upvote at

On Tue, Dec 1, 2015 at 10:47 AM, Shams ul Haque <> wrote:
> Hi All,
> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
> CustomerID in which 2 RDDs have value of Iterable type and one has signle
> bean. All RDDs have id of Long type as CustomerId. Below are the model for 3
> RDDs:
> JavaPairRDD<Long, Iterable<TransactionInfo>>
> JavaPairRDD<Long, Iterable<TransactionRaw>>
> JavaPairRDD<Long, TransactionAggr>
> Now, i have to merge all these 3 RDDs as signle one so that i can generate
> excel report as per each customer by using data in 3 RDDs.
> As i tried to using Join Transformation but it needs RDDs of same type and
> it works for two RDDs.
> So my questions is,
> 1. is there any way to done my merging task efficiently, so that i can get
> all 3 dataset by CustomerId?
> 2. If i merge 1st two using Join Transformation, then do i need to run
> groupByKey() before Join so that all data related to single customer will be
> on one node?
> Thanks
> Shams

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message