drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Raj <rahul....@option3.io>
Subject Re: Drill JDBC Plugin limit queries
Date Sun, 14 Oct 2018 03:29:47 GMT
Vitalii,

Created documentation ticket DRILL-6794

How do we proceed on extending the scan operators to support JDBC plugins?

Regards,
Rahul

On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <vitalii@apache.org> wrote:

> To update the documentation, since that issues were solved by using these
> properties in connection URL:
> defaultRowFetchSize=10000  [1]
> defaultAutoCommit=false    [2]
> The full URL was there "url": "jdbc:postgresql://
>
> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
> "
>
> If some issues are still present, it is also reasonable to create tickets
> to track them.
>
> [1]
>
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
> [2]
>
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
>
> On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <rahul.raj@option3.io> wrote:
>
> > Should I create tickets to track these issues or should I create a ticket
> > to update the documentation?
> >
> > Rahul
> >
> > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vitalii@apache.org>
> > wrote:
> >
> > > 1. You are right, it means it is reasonable to extend this rule for
> > > applying on other Scan operators (or possibly to create the separate
> > one).
> > > 2. There was a question about OOM issues in Drill + PostgreSQL, please
> > take
> > > a look [1].
> > >     Since you are trying to setup this configs, It will be good, if you
> > > create a Jira ticket to add this info to Drill docs [2]
> > >
> > > [1]
> > >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> > >
> > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <rahul.raj@option3.io>
> wrote:
> > >
> > > > Regarding the heap out of error, it could be that the jdbc driver is
> > > > prefetching the entire record set to memory. I just had a look at
> > > > JdbcRecordReader, looks like by setting connection#autoCommit(false)
> > and
> > > a
> > > > sufficient fetch size we could force the driver to stream data as
> > > required.
> > > > This is how postgres driver works.
> > > >
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > > >
> > > > We will have to see the behaviour of other drivers too.
> > > >
> > > > Let me know your thoughts here.
> > > >
> > > > Regards,
> > > > Rahul
> > > >
> > > >
> > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <rahul.raj@option3.io>
> > wrote:
> > > >
> > > > > Hi Vitalii,
> > > > >
> > > > > There are two concrete implementations of the class -
> > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies to
> > > Parquet.
> > > > > And pushdown only apply limit but not offset"
> > > > >
> > > > > Also I enabled debug mode and found LIMIT is not getting pushed to
> > the
> > > > > query.
> > > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost = {83.0
> > > rows,
> > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost = {72.0
> > rows,
> > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost =
> > {61.0
> > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > > > >           JdbcPrel(sql=[SELECT * FROM "u_g001"."executioniteration"
> > > WHERE
> > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0
rows,
> > > 50.0
> > > > cpu
> > > > >
> > > > > Regarding the second point, its the java heap getting filled with
> > jdbc
> > > > > results. How do we address this?
> > > > >
> > > > > Regards,
> > > > > Rahul
> > > > >
> > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <
> vitalii@apache.org>
> > > > > wrote:
> > > > >
> > > > >> Hi Rahul,
> > > > >>
> > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should do
> this
> > > > >> optimization, whether the GroupScan supports Limit Push Down.
> > > > >> Also you can verify in debug mode whether this rule is fired.
> > > > >> Possibly for some external DB (like MapR-DB) Drill should have
the
> > > > >> separate
> > > > >> class for this optimization [2].
> > > > >>
> > > > >> [1]
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> > > > >> [2]
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> > > > >>
> > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <rahul.raj@option3.io>
> > > wrote:
> > > > >>
> > > > >> > Hi,
> > > > >> >
> > > > >> > Drill does not push the LIMIT queries to external databases
and
> I
> > > > >> assume it
> > > > >> > could be more related to Calcite. This leads to out of memory
> > > > situations
> > > > >> > while querying large table to view few records.  Is there
> > something
> > > > that
> > > > >> > could be improved here? One solutions would be to push filters
> > down
> > > to
> > > > >> the
> > > > >> > DB and/or combined with some JDBC batch size limit to flush
a
> part
> > > as
> > > > >> > parquet.
> > > > >> >
> > > > >> > Regards,
> > > > >> > Rahul
> > > > >> >
> > > > >> > --
> > > > >> > _*** This email and any files transmitted with it are
> confidential
> > > and
> > > > >> > intended solely for the use of the individual or entity
to whom
> it
> > > is
> > > > >> > addressed. If you are not the named addressee then you should
> not
> > > > >> > disseminate, distribute or copy this e-mail. Please notify
the
> > > sender
> > > > >> > immediately and delete this e-mail from your system.***_
> > > > >> >
> > > > >>
> > > > >
> > > >
> > > > --
> > > > _*** This email and any files transmitted with it are confidential
> and
> > > > intended solely for the use of the individual or entity to whom it is
> > > > addressed. If you are not the named addressee then you should not
> > > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > > immediately and delete this e-mail from your system.***_
> > > >
> > >
> >
> > --
> > _*** This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom it is
> > addressed. If you are not the named addressee then you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and delete this e-mail from your system.***_
> >
>

-- 
_*** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.***_

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message