drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Diravka <vita...@apache.org>
Subject Re: Drill JDBC Plugin limit queries
Date Sat, 13 Oct 2018 12:46:02 GMT
1. You are right, it means it is reasonable to extend this rule for
applying on other Scan operators (or possibly to create the separate one).
2. There was a question about OOM issues in Drill + PostgreSQL, please take
a look [1].
    Since you are trying to setup this configs, It will be good, if you
create a Jira ticket to add this info to Drill docs [2]

[1] https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
[2] https://drill.apache.org/docs/rdbms-storage-plugin/

On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <rahul.raj@option3.io> wrote:

> Regarding the heap out of error, it could be that the jdbc driver is
> prefetching the entire record set to memory. I just had a look at
> JdbcRecordReader, looks like by setting connection#autoCommit(false) and a
> sufficient fetch size we could force the driver to stream data as required.
> This is how postgres driver works.
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
>
> We will have to see the behaviour of other drivers too.
>
> Let me know your thoughts here.
>
> Regards,
> Rahul
>
>
> On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <rahul.raj@option3.io> wrote:
>
> > Hi Vitalii,
> >
> > There are two concrete implementations of the class -
> > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > LIMIT_ON_SCAN has a comment mentioning "For now only applies to Parquet.
> > And pushdown only apply limit but not offset"
> >
> > Also I enabled debug mode and found LIMIT is not getting pushed to the
> > query.
> >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0 rows,
> > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> >       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0 rows,
> > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {61.0
> > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> >           JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration" WHERE
> > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 rows, 50.0
> cpu
> >
> > Regarding the second point, its the java heap getting filled with jdbc
> > results. How do we address this?
> >
> > Regards,
> > Rahul
> >
> > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <vitalii@apache.org>
> > wrote:
> >
> >> Hi Rahul,
> >>
> >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
> >> optimization, whether the GroupScan supports Limit Push Down.
> >> Also you can verify in debug mode whether this rule is fired.
> >> Possibly for some external DB (like MapR-DB) Drill should have the
> >> separate
> >> class for this optimization [2].
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> >> [2]
> >>
> >>
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> >>
> >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <rahul.raj@option3.io> wrote:
> >>
> >> > Hi,
> >> >
> >> > Drill does not push the LIMIT queries to external databases and I
> >> assume it
> >> > could be more related to Calcite. This leads to out of memory
> situations
> >> > while querying large table to view few records.  Is there something
> that
> >> > could be improved here? One solutions would be to push filters down to
> >> the
> >> > DB and/or combined with some JDBC batch size limit to flush a part as
> >> > parquet.
> >> >
> >> > Regards,
> >> > Rahul
> >> >
> >> > --
> >> > _*** This email and any files transmitted with it are confidential and
> >> > intended solely for the use of the individual or entity to whom it is
> >> > addressed. If you are not the named addressee then you should not
> >> > disseminate, distribute or copy this e-mail. Please notify the sender
> >> > immediately and delete this e-mail from your system.***_
> >> >
> >>
> >
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message