drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Diravka <vitalii.dira...@gmail.com>
Subject Re: question about views
Date Mon, 30 Apr 2018 15:10:02 GMT
Ted,

The rules are enabled and DRILL-3855 [1] is resolved.
Please try your queries with latest Drill master version.

[1] https://issues.apache.org/jira/browse/DRILL-3855

Kind regards
Vitalii


On Mon, Apr 30, 2018 at 4:31 PM Nicolas Paris <niparisco@gmail.com> wrote:

> Hi
>
> This looks an interesting design.
>
> Am I correct such view
> would hit the RDBMS for every query ?
> However such view would hit the parquet file only when
> the timestamp predicate would match a partition ?
>
> Any news on a recent test to confirm the design ?
>
> Thanks
>
> 2018-03-20 6:49 GMT+01:00 Ted Dunning <ted.dunning@gmail.com>:
>
> > Aman,
> >
> > That is exactly the clarification that I needed. I had a hazy memory of a
> > problem in this area, but not enough to actually figure out the current
> > state.
> >
> > In case anybody cares, being able to do this is really handy. The basic
> > idea is to keep long history in files and recent history in a DB. That
> > allows you to create files with data that is advantageously sorted in
> order
> > to get excellent compression. You can get nearly atomic switch-over to
> > newly created files with lazy deletion of database entries by using a
> > reference to a cutoff date in a database row. The file side would only
> look
> > for data before the cutoff and the DB would only look for data after the
> > cut. By positioning new files (created by CTAS on an about to be obsolete
> > part of the DB) before changing the cutoff date, we get apparent
> atomicity.
> >
> > After the switch, and after a reasonable delay beyond that (to let all
> > pending queries finish), the DB can be trimmed.
> >
> > Without a working pushdown through unions, this is all kind of pointless.
> > If that is working now, it would be fabulous.
> >
> > An example of how big a win this can be, consider a use case where we
> want
> > to keep all old states of customer preferences and context (say for a
> > mobile phone). Almost all of the hundreds of settings for an individual
> > would be unchanged even if a few do change. That means that if you could
> > arrange a day (or more) of data by user id, the columnar compression of
> > parquet would crush the data size. This only works, however, if you can
> > collect a fair number of rows for each user. Thus the idea of a hybrid
> > setup.
> >
> >
> >
> > On Mon, Mar 19, 2018 at 11:57 PM, Aman Sinha <amansinha@apache.org>
> wrote:
> >
> > > Due to an infinite loop occurring in Calcite planning, we had to
> disable
> > > the filter pushdown past the union (SetOps).  See
> > > https://issues.apache.org/jira/browse/DRILL-3855.
> > > Now that we have rebased on Calcite 1.15.0, we should re-enable this
> and
> > > test and if the pushdown works then the partition pruning on both sides
> > of
> > > the union should automatically work after that.
> > >
> > > Will follow-up on this..
> > >
> > > -Aman
> > >
> > > On Mon, Mar 19, 2018 at 3:02 PM, Kunal Khatua <kunalkhatua@gmail.com>
> > > wrote:
> > >
> > > > I think Ted's question is 2 fold, with the former being more
> important.
> > > > 1. Can we push filters past a union.
> > > > 2. Will Drill push filters down to the source.
> > > >
> > > > For the latter, it depends on the source.
> > > > For the former, it depends primarily on whether Calcite supports
> this.
> > I
> > > > haven't tried it, so I can't say.
> > > >
> > > > On 3/19/2018 2:22:54 PM, rahul challapalli <
> challapallirahul@gmail.com
> > >
> > > > wrote:
> > > > First I would suggest to ignore the view and try out a query which
> has
> > > the
> > > > required filters as part of the subqueries on both sides of the union
> > > (for
> > > > both the database and partitioned parquet data). The plan for such a
> > > query
> > > > should have the answers to your question. If both the subqueries
> > > > independently prune out un-necessary data, using partitions or
> > indexes, I
> > > > don't think adding a union between them would alter that behavior.
> > > >
> > > > -Rahul
> > > >
> > > > On Mon, Mar 19, 2018 at 1:44 PM, Ted Dunning wrote:
> > > >
> > > > > IF I create a view that is a union of partitioned parquet files
> and a
> > > > > database that has secondary indexes, will Drill be able to properly
> > > push
> > > > > down query limits into both parts of the union?
> > > > >
> > > > > In particular, if I have lots of archival data and parquet
> > partitioned
> > > by
> > > > > time but my query only asks for recent data that is in the
> database,
> > > will
> > > > > the query avoid the parquet files entirely (as you would wish)?
> > > > >
> > > > > Conversely, if the data I am asking for is entirely in the archive,
> > > will
> > > > > the query make use of the partitioning on my parquet files
> correctly?
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message