drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Diravka <vita...@apache.org>
Subject Re: Drill JDBC Plugin limit queries
Date Sat, 13 Oct 2018 13:16:29 GMT
To update the documentation, since that issues were solved by using these
properties in connection URL:
defaultRowFetchSize=10000  [1]
defaultAutoCommit=false    [2]
The full URL was there "url": "jdbc:postgresql://
myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
"

If some issues are still present, it is also reasonable to create tickets
to track them.

[1]
https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
[2]
https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E

On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <rahul.raj@option3.io> wrote:

> Should I create tickets to track these issues or should I create a ticket
> to update the documentation?
>
> Rahul
>
> On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vitalii@apache.org>
> wrote:
>
> > 1. You are right, it means it is reasonable to extend this rule for
> > applying on other Scan operators (or possibly to create the separate
> one).
> > 2. There was a question about OOM issues in Drill + PostgreSQL, please
> take
> > a look [1].
> >     Since you are trying to setup this configs, It will be good, if you
> > create a Jira ticket to add this info to Drill docs [2]
> >
> > [1]
> > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> >
> > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <rahul.raj@option3.io> wrote:
> >
> > > Regarding the heap out of error, it could be that the jdbc driver is
> > > prefetching the entire record set to memory. I just had a look at
> > > JdbcRecordReader, looks like by setting connection#autoCommit(false)
> and
> > a
> > > sufficient fetch size we could force the driver to stream data as
> > required.
> > > This is how postgres driver works.
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > >
> > > We will have to see the behaviour of other drivers too.
> > >
> > > Let me know your thoughts here.
> > >
> > > Regards,
> > > Rahul
> > >
> > >
> > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <rahul.raj@option3.io>
> wrote:
> > >
> > > > Hi Vitalii,
> > > >
> > > > There are two concrete implementations of the class -
> > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> > Parquet.
> > > > And pushdown only apply limit but not offset"
> > > >
> > > > Also I enabled debug mode and found LIMIT is not getting pushed to
> the
> > > > query.
> > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0
> > rows,
> > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0
> rows,
> > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> {61.0
> > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > > >           JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration"
> > WHERE
> > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0 rows,
> > 50.0
> > > cpu
> > > >
> > > > Regarding the second point, its the java heap getting filled with
> jdbc
> > > > results. How do we address this?
> > > >
> > > > Regards,
> > > > Rahul
> > > >
> > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <vitalii@apache.org>
> > > > wrote:
> > > >
> > > >> Hi Rahul,
> > > >>
> > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do this
> > > >> optimization, whether the GroupScan supports Limit Push Down.
> > > >> Also you can verify in debug mode whether this rule is fired.
> > > >> Possibly for some external DB (like MapR-DB) Drill should have the
> > > >> separate
> > > >> class for this optimization [2].
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> > > >> [2]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> > > >>
> > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <rahul.raj@option3.io>
> > wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > Drill does not push the LIMIT queries to external databases and
I
> > > >> assume it
> > > >> > could be more related to Calcite. This leads to out of memory
> > > situations
> > > >> > while querying large table to view few records.  Is there
> something
> > > that
> > > >> > could be improved here? One solutions would be to push filters
> down
> > to
> > > >> the
> > > >> > DB and/or combined with some JDBC batch size limit to flush a
part
> > as
> > > >> > parquet.
> > > >> >
> > > >> > Regards,
> > > >> > Rahul
> > > >> >
> > > >> > --
> > > >> > _*** This email and any files transmitted with it are confidential
> > and
> > > >> > intended solely for the use of the individual or entity to whom
it
> > is
> > > >> > addressed. If you are not the named addressee then you should
not
> > > >> > disseminate, distribute or copy this e-mail. Please notify the
> > sender
> > > >> > immediately and delete this e-mail from your system.***_
> > > >> >
> > > >>
> > > >
> > >
> > > --
> > > _*** This email and any files transmitted with it are confidential and
> > > intended solely for the use of the individual or entity to whom it is
> > > addressed. If you are not the named addressee then you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately and delete this e-mail from your system.***_
> > >
> >
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message