drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Diravka <vita...@apache.org>
Subject Re: Drill JDBC Plugin limit queries
Date Thu, 18 Oct 2018 19:34:48 GMT
Rahul,

Possibly *JdbcGroupScan* can be improved, for instance by overriding
*supportsLimitPushdown()* and *applyLimit()* methods,
*double rows *field can be updated by the limit value.

I've performed the following query: select * from mysql.`testdb`.`table`
limit 2;
but the following one is passed to MySQL: SELECT * FROM `testdb`.`table`
https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcRecordReader.java#L187
So it is definitely should be improved.

*Note:* Changed mailing list to devs.

On Sun, Oct 14, 2018 at 6:30 AM Rahul Raj <rahul.raj@option3.io> wrote:

> Vitalii,
>
> Created documentation ticket DRILL-6794
>
> How do we proceed on extending the scan operators to support JDBC plugins?
>
> Regards,
> Rahul
>
> On Sat, Oct 13, 2018 at 6:47 PM Vitalii Diravka <vitalii@apache.org>
> wrote:
>
> > To update the documentation, since that issues were solved by using these
> > properties in connection URL:
> > defaultRowFetchSize=10000  [1]
> > defaultAutoCommit=false    [2]
> > The full URL was there "url": "jdbc:postgresql://
> >
> >
> myhost.mydomain.com/mydb?useCursorFetch=true&defaultAutoCommit=false&loggerLevel=TRACE&loggerFile=/tmp/jdbc.log&defaultRowFetchSize=10000
> > "
> >
> > If some issues are still present, it is also reasonable to create tickets
> > to track them.
> >
> > [1]
> >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3CCADN0Fn9066hwvu_ZyDJ24tkAoJH5hqXoysCv83z7DdSSfjr-CQ%40mail.gmail.com%3E
> > [2]
> >
> >
> https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/%3C0d36e0e6e8dc1e77bbb67bbfde5f5296e290c075.camel%40omnicell.com%3E
> >
> > On Sat, Oct 13, 2018 at 3:56 PM Rahul Raj <rahul.raj@option3.io> wrote:
> >
> > > Should I create tickets to track these issues or should I create a
> ticket
> > > to update the documentation?
> > >
> > > Rahul
> > >
> > > On Sat, Oct 13, 2018 at 6:16 PM Vitalii Diravka <vitalii@apache.org>
> > > wrote:
> > >
> > > > 1. You are right, it means it is reasonable to extend this rule for
> > > > applying on other Scan operators (or possibly to create the separate
> > > one).
> > > > 2. There was a question about OOM issues in Drill + PostgreSQL,
> please
> > > take
> > > > a look [1].
> > > >     Since you are trying to setup this configs, It will be good, if
> you
> > > > create a Jira ticket to add this info to Drill docs [2]
> > > >
> > > > [1]
> > > >
> > https://mail-archives.apache.org/mod_mbox/drill-user/201808.mbox/browser
> > > > [2] https://drill.apache.org/docs/rdbms-storage-plugin/
> > > >
> > > > On Sat, Oct 13, 2018 at 2:21 PM Rahul Raj <rahul.raj@option3.io>
> > wrote:
> > > >
> > > > > Regarding the heap out of error, it could be that the jdbc driver
> is
> > > > > prefetching the entire record set to memory. I just had a look at
> > > > > JdbcRecordReader, looks like by setting
> connection#autoCommit(false)
> > > and
> > > > a
> > > > > sufficient fetch size we could force the driver to stream data as
> > > > required.
> > > > > This is how postgres driver works.
> > > > >
> > > >
> > >
> >
> https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor
> > > > >
> > > > > We will have to see the behaviour of other drivers too.
> > > > >
> > > > > Let me know your thoughts here.
> > > > >
> > > > > Regards,
> > > > > Rahul
> > > > >
> > > > >
> > > > > On Sat, Oct 13, 2018 at 3:47 PM Rahul Raj <rahul.raj@option3.io>
> > > wrote:
> > > > >
> > > > > > Hi Vitalii,
> > > > > >
> > > > > > There are two concrete implementations of the class -
> > > > > > DrillPushLimitToScanRule LIMIT_ON_SCAN and
> > > > > > DrillPushLimitToScanRule LIMIT_ON_PROJECT.
> > > > > > LIMIT_ON_SCAN has a comment mentioning "For now only applies
to
> > > > Parquet.
> > > > > > And pushdown only apply limit but not offset"
> > > > > >
> > > > > > Also I enabled debug mode and found LIMIT is not getting pushed
> to
> > > the
> > > > > > query.
> > > > > >     LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost
=
> {83.0
> > > > rows,
> > > > > > 226.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 261
> > > > > >       UnionExchangePrel: rowcount = 11.0, cumulative cost =
{72.0
> > > rows,
> > > > > > 182.0 cpu, 0.0 io, 585728.0 network, 0.0 memory}, id = 259
> > > > > >         LimitPrel(fetch=[11]): rowcount = 11.0, cumulative cost
=
> > > {61.0
> > > > > > rows, 94.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 257
> > > > > >           JdbcPrel(sql=[SELECT * FROM
> "u_g001"."executioniteration"
> > > > WHERE
> > > > > > "id" > 360500000 ]): rowcount = 50.0, cumulative cost = {50.0
> rows,
> > > > 50.0
> > > > > cpu
> > > > > >
> > > > > > Regarding the second point, its the java heap getting filled
with
> > > jdbc
> > > > > > results. How do we address this?
> > > > > >
> > > > > > Regards,
> > > > > > Rahul
> > > > > >
> > > > > > On Fri, Oct 12, 2018 at 8:11 PM Vitalii Diravka <
> > vitalii@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Rahul,
> > > > > >>
> > > > > >> Drill has *DrillPushLimitToScanRule* [1] rule, which should
do
> > this
> > > > > >> optimization, whether the GroupScan supports Limit Push
Down.
> > > > > >> Also you can verify in debug mode whether this rule is fired.
> > > > > >> Possibly for some external DB (like MapR-DB) Drill should
have
> the
> > > > > >> separate
> > > > > >> class for this optimization [2].
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushLimitToScanRule.java#L28
> > > > > >> [2]
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/pull/1466/files#diff-4819b70118487d81bc9c46a04b0eaaa3R37
> > > > > >>
> > > > > >> On Fri, Oct 12, 2018 at 3:19 PM Rahul Raj <rahul.raj@option3.io
> >
> > > > wrote:
> > > > > >>
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > Drill does not push the LIMIT queries to external databases
> and
> > I
> > > > > >> assume it
> > > > > >> > could be more related to Calcite. This leads to out
of memory
> > > > > situations
> > > > > >> > while querying large table to view few records.  Is
there
> > > something
> > > > > that
> > > > > >> > could be improved here? One solutions would be to push
filters
> > > down
> > > > to
> > > > > >> the
> > > > > >> > DB and/or combined with some JDBC batch size limit
to flush a
> > part
> > > > as
> > > > > >> > parquet.
> > > > > >> >
> > > > > >> > Regards,
> > > > > >> > Rahul
> > > > > >> >
> > > > > >> > --
> > > > > >> > _*** This email and any files transmitted with it are
> > confidential
> > > > and
> > > > > >> > intended solely for the use of the individual or entity
to
> whom
> > it
> > > > is
> > > > > >> > addressed. If you are not the named addressee then
you should
> > not
> > > > > >> > disseminate, distribute or copy this e-mail. Please
notify the
> > > > sender
> > > > > >> > immediately and delete this e-mail from your system.***_
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > > > --
> > > > > _*** This email and any files transmitted with it are confidential
> > and
> > > > > intended solely for the use of the individual or entity to whom it
> is
> > > > > addressed. If you are not the named addressee then you should not
> > > > > disseminate, distribute or copy this e-mail. Please notify the
> sender
> > > > > immediately and delete this e-mail from your system.***_
> > > > >
> > > >
> > >
> > > --
> > > _*** This email and any files transmitted with it are confidential and
> > > intended solely for the use of the individual or entity to whom it is
> > > addressed. If you are not the named addressee then you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately and delete this e-mail from your system.***_
> > >
> >
>
> --
> _*** This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom it is
> addressed. If you are not the named addressee then you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and delete this e-mail from your system.***_
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message