spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: using RDD result in another TDD
Date Wed, 12 Nov 2014 19:44:10 GMT
You can't use RDDs inside of RDDs, so this won't work anyway. You could
collect the result of RDD1 and broadcast it, perhaps. collect() blocks.

On Wed, Nov 12, 2014 at 6:41 PM, Adrian Mocanu <amocanu@verticalscope.com>
wrote:

>  Hi
>
> I’d like to use the result of one RDD1 in another RDD2. Normally I would
> use something like a barrier so make the 2nd RDD wait till the
> computation of the 1st RDD is done then include the result from RDD1 in
> the closure for RDD2.
>
> Currently I create another RDD, RDD3, out of the result of RDD1 then do
> Cartesian product on RDD2 and RDD3. NB: This operation is slow and expands
> partitions from 270 to 1200
>
>
>
> This is a simplified example but I think it should help:
>
> What I want to do (pseudocode):
>
>    val a:Int=RDD1.reduce(..)
>
>    RDD2.map(x => x*a)
>
>
>
> What I use right now (pseudocode):
>
>   val a:Int=RDD1.reduce(..)
>
>   RDD3=makeRDD(a)
>
>    RDD2.cartesianProduct(RDD3)
>
>
>
> How to structure this type of operation to not need the barrier to block
> computing RDD2 until RDD1 is done?
>
>
>
> -Adrian
>
>
>

Mime
View raw message