It picks sort merge join, when spark.sql.autoBroadcastJoinThreshold is set to -1, or when the size of the small table is more than spark.sql.spark.sql.autoBroadcastJoinThreshold.

On Mon, Jul 4, 2016 at 10:17 PM, Takeshi Yamamuro <linguin.m.s@gmail.com> wrote:
The join selection can be described in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L92.
If you have join keys, you can set -1 at `spark.sql.autoBroadcastJoinThreshold` to disable broadcast joins. Then, hash joins are used in queries.

// maropu 

On Tue, Jul 5, 2016 at 4:23 AM, Lalitha MV <lalithamv92@gmail.com> wrote:
Hi maropu, 

Thanks for your reply. 

Would it be possible to write a rule for this, to make it always pick shuffle hash join, over other join implementations(i.e. sort merge and broadcast)? 

Is there any documentation demonstrating rule based transformation for physical plan trees? 

Thanks,
Lalitha

On Sat, Jul 2, 2016 at 12:58 AM, Takeshi Yamamuro <linguin.m.s@gmail.com> wrote:
Hi,

No, spark has no hint for the hash join.

// maropu

On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV <lalithamv92@gmail.com> wrote:
Hi, 

In order to force broadcast hash join, we can set the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce shuffle hash join in spark sql? 


Thanks,
Lalitha



--
---
Takeshi Yamamuro



--
Regards,
Lalitha



--
---
Takeshi Yamamuro



--
Regards,
Lalitha