spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: Enforcing shuffle hash join
Date Tue, 05 Jul 2016 05:17:12 GMT
The join selection can be described in
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L92
.
If you have join keys, you can set -1 at
`spark.sql.autoBroadcastJoinThreshold` to disable broadcast joins. Then,
hash joins are used in queries.

// maropu

On Tue, Jul 5, 2016 at 4:23 AM, Lalitha MV <lalithamv92@gmail.com> wrote:

> Hi maropu,
>
> Thanks for your reply.
>
> Would it be possible to write a rule for this, to make it always pick
> shuffle hash join, over other join implementations(i.e. sort merge and
> broadcast)?
>
> Is there any documentation demonstrating rule based transformation for
> physical plan trees?
>
> Thanks,
> Lalitha
>
> On Sat, Jul 2, 2016 at 12:58 AM, Takeshi Yamamuro <linguin.m.s@gmail.com>
> wrote:
>
>> Hi,
>>
>> No, spark has no hint for the hash join.
>>
>> // maropu
>>
>> On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV <lalithamv92@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> In order to force broadcast hash join, we can set
>>> the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce
>>> shuffle hash join in spark sql?
>>>
>>>
>>> Thanks,
>>> Lalitha
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>
>
> --
> Regards,
> Lalitha
>



-- 
---
Takeshi Yamamuro

Mime
View raw message