spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zhangliyun <kelly...@126.com>
Subject Question about in subquery
Date Tue, 22 Oct 2019 21:13:43 GMT
Hi all:
   i  used in subquery like following in spark 2.3.1
{code}
set spark.sql.autoBroadcastJoinThreshold=-1;


explain select * from testdata where key1 not in (select key1 from testdata as b);




= Physical Plan ==


BroadcastNestedLoopJoin BuildRight, LeftAnti, ((key1#60 = key1#62) || isnull((key1#60 = key1#62)))


:- HiveTableScan [key1#60, value1#61], HiveTableRelation `default`.`testdata`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[key1#60, value1#61]


+- BroadcastExchange IdentityBroadcastMode


   +- HiveTableScan [key1#62], HiveTableRelation `default`.`testdata`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
[key1#62, value1#63]
{code}


It seems that even i disable the broadcast join ( set spark.sql.autoBroadcastJoinThreshold=-1).
it still a  broadcast join not sort merged join. Is there any way to let spark sql to use
sort merge join by setting parameters besides directly modifying the sql like : explain select
testdata.* from testdata left join testdata2 where testdata.key1=testdata2.key1 and testdata2.key1
is null;






Appreciate to have some suggestions


Best Regards


Kelly Zhang
Mime
View raw message