spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tanase <atan...@adobe.com>
Subject Re: Separate all values from Iterable
Date Tue, 27 Oct 2015 11:13:14 GMT
The operator you’re looking for is .flatMap. It flattens all the results if you have nested
lists of results (e.g. A map over a source element can return zero or more target elements)
I’m not very familiar with the Java APIs but in scala it would go like this (keeping type
annotations only as documentation):

def toBson(bean: ProductBean): BSONObject = { … }

val customerBeans: RDD[(Long, Seq[ProductBean])] = allBeans.groupBy(_.customerId)
val mongoObjects: RDD[BSONObject] = customerBeans.flatMap { case (id, beans) => beans.map(toBson)
}

Hope this helps,
-adrian

From: Shams ul Haque
Date: Tuesday, October 27, 2015 at 12:50 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Separate all values from Iterable

Hi,


I have grouped all my customers in JavaPairRDD<Long, Iterable<ProductBean>> by
there customerId (of Long type). Means every customerId have a List or ProductBean.

Now i want to save all ProductBean to DB irrespective of customerId. I got all values by using
method
JavaRDD<Iterable<ProductBean>> values = custGroupRDD.values();

Now i want to convert JavaRDD<Iterable<ProductBean>> to JavaRDD<Object, BSONObject>
so that i can save it to Mongo. Remember, every BSONObject is made of Single ProductBean.

I am not getting any idea of how to do this in Spark, i mean which Spark's Transformation
is used to do that job. I think this task is some kind of seperate all values from Iterable.
Please let me know how is this possible.
Any hint in Scala or Python are also ok.


Thanks

Shams
Mime
View raw message