spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Tustin <mtus...@handybook.com>
Subject Re: RDD.broadcast
Date Thu, 28 Apr 2016 11:26:51 GMT
I don't know what your notation really means. I'm very much unclear on why
you can't use the filter method for 1. If you're talking about
splitting/bucketing rather filtering as such I think that is a specific
lacuna in spark's Api.

I've generally found the join api to be entirely adequate for my needs, so
I don't really have a comment on 2.

On Thursday, April 28, 2016, <Ioannis.Deligiannis@nomura.com> wrote:

> One example pattern we have it doing joins or filters based on two
> datasets. E.g.
>
> 1         Filter –multiple- RddB for a given set extracted from RddA
> (keyword here is multiple times)
>
> a.       RddA -> keyBy -> distinct -> collect() to Set A;
>
> b.      RddB -> Filter using Set A;
>
> 2         “Join” using composition on executor (again multiple times)
>
> a.       RddA -> filter by XYZ -> keyBy join attribute -> collectAsMap
> ->Broadcast MapA;
>
> b.      RddB -> map (Broadcast<Map<K,V>> MapA;
>
>
>
> The first use case might not be that common, but joining a large RDD with
> a small (reference) RDD is quite common and much faster than using “join”
> method.
>
>
>
>
>
> *From:* Marcin Tustin [mailto:mtustin@handybook.com
> <javascript:_e(%7B%7D,'cvml','mtustin@handybook.com');>]
> *Sent:* 28 April 2016 12:08
> *To:* Deligiannis, Ioannis (UK)
> *Cc:* dev@spark.apache.org
> <javascript:_e(%7B%7D,'cvml','dev@spark.apache.org');>
> *Subject:* Re: RDD.broadcast
>
>
>
> Why would you ever need to do this? I'm genuinely curious. I view collects
> as being solely for interactive work.
>
> On Thursday, April 28, 2016, <Ioannis.Deligiannis@nomura.com
> <javascript:_e(%7B%7D,'cvml','Ioannis.Deligiannis@nomura.com');>> wrote:
>
> Hi,
>
>
>
> It is a common pattern to process an RDD, collect (typically a subset) to
> the driver and then broadcast back.
>
>
>
> Adding an RDD method that can do that using the torrent broadcast
> mechanics would be much more efficient. In addition, it would not require
> the Driver to also utilize its Heap holding this broadcast.
>
>
>
> I guess this can become complicated if the resulting broadcast is required
> to keep lineage information, but assuming a torrent distribution, once the
> broadcast is synced then lineage would not be required. I’d also expect the
> call to rdd.brodcast to be an action that eagerly distributes the broadcast
> and returns when the operation has succeeded.
>
>
>
> Is this something that could be implemented or are there any reasons that
> prohibits this?
>
>
>
> Thanks
>
> Ioannis
>
>
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.Nomura.com_email-5Fdisclaimer.htm&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=SLnOgTBJ2zAlhtvjcFRXfqUArds-4HSAZCgFXLgMCVY&e=>
>
>
>
> Want to work at Handy? Check out our culture deck and open roles
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.handy.com_careers&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=WgDnCrSGv_qt66f2cabjugmMGU46gc5rSkt_gm7lEkQ&e=>
>
> Latest news
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.handy.com_press&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=rfQxr8cDwVFK7Mql1_HdnvqAmXeiOHZgnjNtKXGn_Kg&e=>
at
> Handy
>
> Handy just raised $50m
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__venturebeat.com_2015_11_02_on-2Ddemand-2Dhome-2Dservice-2Dhandy-2Draises-2D50m-2Din-2Dround-2Dled-2Dby-2Dfidelity_&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=RbQTDcalISb9w2WMxzXmRgR1mr7QiCaqpD2bLAkt-z4&e=>
led
> by Fidelity
>
>
>
> [image: Image removed by sender.]
>
> This e-mail (including any attachments) is private and confidential, may
> contain proprietary or privileged information and is intended for the named
> recipient(s) only. Unintended recipients are strictly prohibited from
> taking action on the basis of information in this e-mail and must contact
> the sender immediately, delete this e-mail (and all attachments) and
> destroy any hard copies. Nomura will not accept responsibility or liability
> for the accuracy or completeness of, or the presence of any virus or
> disabling code in, this e-mail. If verification is sought please request a
> hard copy. Any reference to the terms of executed transactions should be
> treated as preliminary only and subject to formal written confirmation by
> Nomura. Nomura reserves the right to retain, monitor and intercept e-mail
> communications through its networks (subject to and in accordance with
> applicable laws). No confidentiality or privilege is waived or lost by
> Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is
> a reference to any entity in the Nomura Holdings, Inc. group. Please read
> our Electronic Communications Legal Notice which forms part of this e-mail:
> http://www.Nomura.com/email_disclaimer.htm
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led 
by Fidelity


Mime
View raw message