spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lalitha MV <lalitham...@gmail.com>
Subject Re: Enforcing shuffle hash join
Date Tue, 05 Jul 2016 05:28:43 GMT
It picks sort merge join, when spark.sql.autoBroadcastJoinThreshold is set
to -1, or when the size of the small table is more than spark.sql.spark.sql.
autoBroadcastJoinThreshold.

On Mon, Jul 4, 2016 at 10:17 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
wrote:

> The join selection can be described in
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L92
> .
> If you have join keys, you can set -1 at
> `spark.sql.autoBroadcastJoinThreshold` to disable broadcast joins. Then,
> hash joins are used in queries.
>
> // maropu
>
> On Tue, Jul 5, 2016 at 4:23 AM, Lalitha MV <lalithamv92@gmail.com> wrote:
>
>> Hi maropu,
>>
>> Thanks for your reply.
>>
>> Would it be possible to write a rule for this, to make it always pick
>> shuffle hash join, over other join implementations(i.e. sort merge and
>> broadcast)?
>>
>> Is there any documentation demonstrating rule based transformation for
>> physical plan trees?
>>
>> Thanks,
>> Lalitha
>>
>> On Sat, Jul 2, 2016 at 12:58 AM, Takeshi Yamamuro <linguin.m.s@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> No, spark has no hint for the hash join.
>>>
>>> // maropu
>>>
>>> On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV <lalithamv92@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> In order to force broadcast hash join, we can set
>>>> the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce
>>>> shuffle hash join in spark sql?
>>>>
>>>>
>>>> Thanks,
>>>> Lalitha
>>>>
>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>>
>> --
>> Regards,
>> Lalitha
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>



-- 
Regards,
Lalitha

Mime
View raw message