spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paley Louie <paley2...@gmail.com>
Subject Re: about broadcast join of base table in spark sql
Date Sun, 02 Jul 2017 04:33:18 GMT
Thank you for your reply, I have tried to set parameter spark.sql.autoBroadcastJoinThreshold
to high enough value, however it does not work,  I think broadcast of base table is disabled
in spark.


> On Jun 30, 2017, at 6:57 PM, Bryan Jeffrey <bryan.jeffrey@gmail.com> wrote:
> 
> Hello. 
> 
> If you want to allow broadcast join with larger broadcasts you can set spark.sql.autoBroadcastJoinThreshold
to a higher value. This will cause the plan to allow join despite 'A' being larger than the
default threshold. 
> 
> Get Outlook for Android <https://aka.ms/ghei36>
> 
> 
> From: paleyl
> Sent: Wednesday, June 28, 10:42 PM
> Subject: about broadcast join of base table in spark sql
> To: dev@spark.org, user@spark.apache.org
> 
> 
> Hi All,
> 
> 
> Recently I meet a problem in broadcast join: I want to left join table A and B, A is
the smaller one and the left table, so I wrote 
> 
> A = A.join(B,A("key1") === B("key2"),"left")
> 
> but I found that A is not broadcast out, as the shuffle size is still very large.
> 
> I guess this is a designed mechanism in spark, so could anyone please tell me why it
is designed like this? I am just very curious.
> 
> 
> Best,
> 
> 
> Paley 
> 
> 
> 


Mime
View raw message