The join selection can be described in https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L92.
If you have join keys, you can set -1 at `spark.sql.autoBroadcastJoinThreshold` to disable broadcast joins. Then, hash joins are used in queries.

// maropu 

On Tue, Jul 5, 2016 at 4:23 AM, Lalitha MV <lalithamv92@gmail.com> wrote:
Hi maropu, 

Thanks for your reply. 

Would it be possible to write a rule for this, to make it always pick shuffle hash join, over other join implementations(i.e. sort merge and broadcast)? 

Is there any documentation demonstrating rule based transformation for physical plan trees? 

Thanks,
Lalitha

On Sat, Jul 2, 2016 at 12:58 AM, Takeshi Yamamuro <linguin.m.s@gmail.com> wrote:
Hi,

No, spark has no hint for the hash join.

// maropu

On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV <lalithamv92@gmail.com> wrote:
Hi, 

In order to force broadcast hash join, we can set the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce shuffle hash join in spark sql? 


Thanks,
Lalitha



--
---
Takeshi Yamamuro



--
Regards,
Lalitha



--
---
Takeshi Yamamuro