spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zhangliyun <kelly...@126.com>
Subject A question about broadcast join in spark2.2.0 and spark2.3.1
Date Sun, 26 May 2019 09:38:04 GMT
Hi all:
  i want to ask a question about how to change common join to broadcast join.
  I have a query like in spark 2.2.0 and spark 2.3.1 seperately
```
select * from Aleft join B
on A.mth_id - 12 =  B.mth_id  and A.email = B.email.
```
 The value of spark.sql.autoBroadcastJoinThreshold is same ( spark.sql.autoBroadcastJoinThreshold=700000000
700M), but for some reason, it became 
broadcast join in 2.2.0 while  a sort merged join in spark 2.3.1.  It is strange that in spark
2.2.0 the broadcast data size is 0 bytes but actually i think the value is not 0.  The problem
here this query should use broadcast join but i don't know why the broadcast data size is
0 in spark2.2 and why this  is not a broadcast join in 2.3.1. Is there any difference about
the threshold of broadcast join in spark 2.2.0 and 2.3.1? Is the parameter "spark.sql.autoBroadcastJoinThreshold"
only factor which decide the broadcast or sort merge join? Appreciate that if you can give
me some suggestion.







Mime
View raw message