spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Ioannis.Deligian...@nomura.com>
Subject RE: RDD.broadcast
Date Thu, 28 Apr 2016 11:20:18 GMT
One example pattern we have it doing joins or filters based on two datasets. E.g.

1         Filter –multiple- RddB for a given set extracted from RddA (keyword here is multiple
times)

a.       RddA -> keyBy -> distinct -> collect() to Set A;

b.      RddB -> Filter using Set A;

2         “Join” using composition on executor (again multiple times)

a.       RddA -> filter by XYZ -> keyBy join attribute -> collectAsMap ->Broadcast
MapA;

b.      RddB -> map (Broadcast<Map<K,V>> MapA;


The first use case might not be that common, but joining a large RDD with a small (reference)
RDD is quite common and much faster than using “join” method.


From: Marcin Tustin [mailto:mtustin@handybook.com]
Sent: 28 April 2016 12:08
To: Deligiannis, Ioannis (UK)
Cc: dev@spark.apache.org
Subject: Re: RDD.broadcast

Why would you ever need to do this? I'm genuinely curious. I view collects as being solely
for interactive work.

On Thursday, April 28, 2016, <Ioannis.Deligiannis@nomura.com<mailto:Ioannis.Deligiannis@nomura.com>>
wrote:
Hi,

It is a common pattern to process an RDD, collect (typically a subset) to the driver and then
broadcast back.

Adding an RDD method that can do that using the torrent broadcast mechanics would be much
more efficient. In addition, it would not require the Driver to also utilize its Heap holding
this broadcast.

I guess this can become complicated if the resulting broadcast is required to keep lineage
information, but assuming a torrent distribution, once the broadcast is synced then lineage
would not be required. I’d also expect the call to rdd.brodcast to be an action that eagerly
distributes the broadcast and returns when the operation has succeeded.

Is this something that could be implemented or are there any reasons that prohibits this?

Thanks
Ioannis

This e-mail (including any attachments) is private and confidential, may contain proprietary
or privileged information and is intended for the named recipient(s) only. Unintended recipients
are strictly prohibited from taking action on the basis of information in this e-mail and
must contact the sender immediately, delete this e-mail (and all attachments) and destroy
any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness
of, or the presence of any virus or disabling code in, this e-mail. If verification is sought
please request a hard copy. Any reference to the terms of executed transactions should be
treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves
the right to retain, monitor and intercept e-mail communications through its networks (subject
to and in accordance with applicable laws). No confidentiality or privilege is waived or lost
by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference
to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications
Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.Nomura.com_email-5Fdisclaimer.htm&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=SLnOgTBJ2zAlhtvjcFRXfqUArds-4HSAZCgFXLgMCVY&e=>

Want to work at Handy? Check out our culture deck and open roles<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.handy.com_careers&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=WgDnCrSGv_qt66f2cabjugmMGU46gc5rSkt_gm7lEkQ&e=>
Latest news<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.handy.com_press&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=rfQxr8cDwVFK7Mql1_HdnvqAmXeiOHZgnjNtKXGn_Kg&e=>
at Handy
Handy just raised $50m<https://urldefense.proofpoint.com/v2/url?u=http-3A__venturebeat.com_2015_11_02_on-2Ddemand-2Dhome-2Dservice-2Dhandy-2Draises-2D50m-2Din-2Dround-2Dled-2Dby-2Dfidelity_&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=RbQTDcalISb9w2WMxzXmRgR1mr7QiCaqpD2bLAkt-z4&e=>
led by Fidelity

[Image removed by sender.]


This e-mail (including any attachments) is private and confidential, may contain proprietary
or privileged information and is intended for the named recipient(s) only. Unintended recipients
are strictly prohibited from taking action on the basis of information in this e-mail and
must contact the sender immediately, delete this e-mail (and all attachments) and destroy
any hard copies. Nomura will not accept responsibility or liability for the accuracy or completeness
of, or the presence of any virus or disabling code in, this e-mail. If verification is sought
please request a hard copy. Any reference to the terms of executed transactions should be
treated as preliminary only and subject to formal written confirmation by Nomura. Nomura reserves
the right to retain, monitor and intercept e-mail communications through its networks (subject
to and in accordance with applicable laws). No confidentiality or privilege is waived or lost
by Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is a reference
to any entity in the Nomura Holdings, Inc. group. Please read our Electronic Communications
Legal Notice which forms part of this e-mail: http://www.Nomura.com/email_disclaimer.htm

Mime
View raw message