spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <amoc...@verticalscope.com>
Subject using RDD result in another TDD
Date Wed, 12 Nov 2014 18:41:47 GMT
Hi
I'd like to use the result of one RDD1 in another RDD2. Normally I would use something like
a barrier so make the 2nd RDD wait till the computation of the 1st RDD is done then include
the result from RDD1 in the closure for RDD2.
Currently I create another RDD, RDD3, out of the result of RDD1 then do Cartesian product
on RDD2 and RDD3. NB: This operation is slow and expands partitions from 270 to 1200

This is a simplified example but I think it should help:
What I want to do (pseudocode):
   val a:Int=RDD1.reduce(..)
   RDD2.map(x => x*a)

What I use right now (pseudocode):
  val a:Int=RDD1.reduce(..)
  RDD3=makeRDD(a)
   RDD2.cartesianProduct(RDD3)

How to structure this type of operation to not need the barrier to block computing RDD2 until
RDD1 is done?

-Adrian


Mime
View raw message