i want to ask a question about how to change common join to broadcast join.
I have a query like in spark 2.2.0 and spark 2.3.1 seperately
select * from A left join B
on A.mth_id - 12 = B.mth_id and A.email = B.email.
The value of spark.sql.autoBroadcastJoinThreshold is same ( spark.sql.autoBroadcastJoinThreshold=700000000 700M), but for some reason, it became
broadcast join in 2.2.0 while a sort merged join in spark 2.3.1. It is strange that in spark 2.2.0 the broadcast data size is 0 bytes but actually i think the value is not 0. The problem here this query should use broadcast join but i don't know why the broadcast data size is 0 in spark2.2 and why this is not a broadcast join in 2.3.1. Is there any difference about the threshold of broadcast join in spark 2.2.0 and 2.3.1? Is the parameter "spark.sql.autoBroadcastJoinThreshold" only factor which decide the broadcast or sort merge join? Appreciate that if you can give me some suggestion.