spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 吴晓菊 <chrysan...@gmail.com>
Subject Re: why BroadcastHashJoinExec is not implemented with outputOrdering?
Date Thu, 28 Jun 2018 15:01:55 GMT
Thanks for the reply.
By looking into the SortMergeJoinExec, I think we can follow what
SortMergeJoin do, for some types of join, if the children is ordered on
join keys, we can output the ordered join keys as output ordering.


Chrysan Wu
吴晓菊
Phone:+86 17717640807


2018-06-28 22:53 GMT+08:00 Wenchen Fan <cloud0fan@gmail.com>:

> SortMergeJoin only reports ordering of the join keys, not the output
> ordering of any child.
>
> It seems reasonable to me that broadcast join should respect the output
> ordering of the children. Feel free to submit a PR to fix it, thanks!
>
> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 <chrysanxia@gmail.com> wrote:
>
>> Why we cannot use the output order of big table?
>>
>>
>> Chrysan Wu
>> Phone:+86 17717640807
>>
>>
>> 2018-06-28 21:48 GMT+08:00 Marco Gaido <marcogaido91@gmail.com>:
>>
>>> The easy answer to this is that SortMergeJoin ensure an outputOrdering,
>>> while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you
>>> don't know which is going to be the order of the output since nothing
>>> enforces it.
>>>
>>> Hope this helps.
>>> Thanks.
>>> Marco
>>>
>>> 2018-06-28 15:46 GMT+02:00 吴晓菊 <chrysanxia@gmail.com>:
>>>
>>>>
>>>> We see SortMergeJoinExec is implemented with outputPartitioning&outputOrdering
>>>> while BroadcastHashJoinExec is only implemented with outputPartitioning.
>>>> Why is the design?
>>>>
>>>> Chrysan Wu
>>>> Phone:+86 17717640807
>>>>
>>>>
>>>
>>

Mime
View raw message