spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Multiple lookups; consolidate result and run further aggregations
Date Sat, 02 Apr 2016 14:10:15 GMT
Looking at the implementation for lookup in PairRDDFunctions, I think your
understanding is correct.


On Sat, Apr 2, 2016 at 3:16 AM, Nirav Patel <npatel@xactlycorp.com> wrote:

> I will start by question: Is spark lookup function on pair rdd is a driver
> action. ie result is returned to driver?
>
> I have list of Keys on driver side and I want to perform multiple parallel
> lookups on pair rdd which returns Seq[V]; consolidate results; and perform
> further aggregation/transformation over cluster.
>
> val seqVal = lookupKeys.flatMap(key => {
>
>         dataRdd.lookup(key)
>
>       })
>
>
> Here's what I think will happen internally:
>
> lookup up for Seq[V]  return result to driver
>
> Consolidation of each Seq[v] has to happen on driver due to flatMap
> function
>
> All subsequent operation will happen on driver side unless I do
> sparkContext.parallelize(seqVal)
>
> Is this correct?
>
> Also, what I am trying to do is efficient multiple lookup. Another option
> is to broadcast lookup keys and perform join.
>
> Please advice.
>
> Thanks
> Nirav
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>

Mime
View raw message