drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "clhubert@gmail.com" <clhub...@gmail.com>
Subject Re: Cartesian Product in Apache Drill
Date Tue, 27 Dec 2016 18:26:24 GMT
Zelaine,

Thanks for the clarity.  I can understand the performance concerns behind
the nested loop.

Can a nested loop be selected for "other" query plans when "
planner.enable_nljoin_for_*scalar_only" *is set to true?

I am trying to determine if I should turn it system wide, or ONLY turn it
on when I need a Cartesian product result.

Any ideas?  Suggestions?

Outside of the Performance Concern..   There are two valid nested loop
usecase.
1) Scalar subqueries
2) ANSI SQL Cross Joins

Do you know if there are plans to incorporate this as a default setting?

*planner.enable_nljoin_for_scalar_only *should probably be
planner.enable_nljoin_for_*scalar *

Similarly, there should be *planner.enable_nljoin_for_crossjoin* and the
default setting should be "true"

Do you agree?   Am I off base someplace?

If there are no plans, and you agree with my assessment.   I will probably
file this as a feature request



Regards,
CLN











On Mon, Dec 26, 2016 at 7:58 PM, Zelaine Fong <zfong@maprtech.com> wrote:

> I'm not sure how widely nested loop joins outside of scalar subqueries
> have been exercised by Drill users, since that setting is not the default.
> Note that nested loop joins can only be processed using broadcast joins
> [1].  So you will incur a lot of network transfer overhead unless the
> smaller of the tables you're joining is kept to a minimum.
>
> [1] https://drill.apache.org/docs/join-planning-guidelines/
>
> -- Zelaine
>
> On Mon, Dec 26, 2016 at 7:05 AM, clhubert@gmail.com <clhubert@gmail.com>
> wrote:
>
>> Zelaine,
>>
>> I appreciate it...   That worked.
>>
>> I am thinking of turning on this feature system wide.
>>
>> Is there any foreseeable issue with using nested joins outside of scalar
>> subqueries?  Performance or otherwise?
>>
>> Regards,
>> CLN
>>
>>
>> On Sun, Dec 25, 2016 at 7:22 PM, Zelaine Fong <zfong@maprtech.com> wrote:
>>
>>> Alternatively, you can set the following configuration to false:
>>>
>>> alter session set `planner.enable_nljoin_for_scalar_only` = false;
>>>
>>> Cartesian joins need to be processed as a nested loop join, and by
>>> default, Drill only considers nested joins in the case where at least one
>>> side of the join is a scalar subquery.
>>>
>>> -- Zelaine
>>>
>>> On Sun, Dec 25, 2016 at 2:46 PM, Ted Dunning <ted.dunning@gmail.com>
>>> wrote:
>>>
>>>> You can fake the limitation by adding a constant column to both tables,
>>>> I
>>>> think, and then joining on the constant.
>>>>
>>>>
>>>>
>>>> On Sun, Dec 25, 2016 at 2:04 PM, clhubert@gmail.com <clhubert@gmail.com
>>>> >
>>>> wrote:
>>>>
>>>> >
>>>> > I am trying to do a cross join to get a cartesian products.
>>>> >
>>>> > Per the error message (attached) and the JIRA ticket I see it isn't
>>>> > supported.
>>>> > https://issues.apache.org/jira/browse/DRILL-3807
>>>> >
>>>> > I wrote the query against using dfs on csv file types.
>>>> >
>>>> > Can I execute a cross join in Apache Drill just by moving my data to
a
>>>> > different file type or Storage Plugin. Such as Parquet,JSON, or RDBMS
>>>> > Plugin.
>>>> >
>>>> > Regards,
>>>> > CLN
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message