drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maryann Xue <maryann....@gmail.com>
Subject Re: Improvements to storage plugin planning integration support
Date Thu, 22 Oct 2015 20:04:04 GMT
Hi Aman,

Thought these entries might be related to our discussion about non-covering
indices just now:

https://issues.apache.org/jira/browse/CALCITE-772
https://issues.apache.org/jira/browse/CALCITE-773


Thanks,
Maryann

On Thu, Oct 22, 2015 at 2:48 PM, Aman Sinha <asinha@maprtech.com> wrote:

> Thanks Maryann and Jinfeng for your comments.   I understand the Phoenix
> approach better now that Maryann clarified that the index is actually a
> projection of some or all columns (non primary key columns) of the table.
> In the relational world, this is similar to what systems such as Vertica
> have done.
>
> Aman
>
> On Thu, Oct 22, 2015 at 11:32 AM, Maryann Xue <maryann.xue@gmail.com>
> wrote:
>
>> Thank you JinFeng for the education on Drill planning! That probably
>> justifies putting secondary index into physical planning.
>> What I was trying to say was that secondary index is not "a faster
>> physical access mechanism", it is just a Phoenix table. And it makes big
>> difference in planning related to Sort, Join and Aggregate as you said. In
>> the pure Calcite world, this is more of a Logical thing.
>>
>>
>> Thanks,
>> Maryann
>>
>> On Thu, Oct 22, 2015 at 2:26 PM, Jinfeng Ni <jinfengni99@gmail.com>
>> wrote:
>>
>>> I do not know how Phoenix's planning works. For Drill, my
>>> understanding is during logical planning, "collation" trait is only
>>> used in SortRemoveRule, to remove the redundant sort operator. (Those
>>> "sort" operators are the one created by Calcite for user-explicit
>>> "ORDER BY" / "LIMIT", not the "enforcer" created in physical
>>> planning).
>>>
>>> The "collation" trait would not have impact in logical planning for
>>> join / aggregation.   The decision between sort-based vs hash-based
>>> join / aggregation is made in physical planning. At that stage, the
>>> "collation" would matter a lot, as it would mean whether Drill has to
>>> add an "enforcer" to get certain trait, in order to get a plan with
>>> sort-based join / aggregation.
>>>
>>> The "collation" trait acts like a physical property, it's more nature
>>> to expose "collation" in physical planning in stead of logical
>>> planning, which more focus on properties inherent in relational
>>> expression. Aman's view that secondary index is part of physical
>>> planning makes sense to me.
>>>
>>> On Thu, Oct 22, 2015 at 10:54 AM, Maryann Xue <maryann.xue@gmail.com>
>>> wrote:
>>> > Hi Aman Sinha,
>>> >
>>> > Yes, Phoenix uses materialization in Calcite to model its secondary
>>> index
>>> > querying. But it's not right to say "In that sense, it would seem to
>>> fit
>>> > into physical planning phase rather than logical, since indexes are a
>>> > faster physical access mechanism for a scan.  The logical properties
>>> of a
>>> > table don't change due to presence of an index."
>>> >
>>> > A secondary index in Phoenix is a projection of part or all of the
>>> columns
>>> > of the original table, and is usually indexed (and sorted) on a
>>> different
>>> > key other than the primary key of the original table. The key in
>>> Phoenix
>>> > table (HBase table) is crucial in two ways:
>>> > 1. Filtering: the use of skip-scan or range-scan vs. full scan.
>>> > 2. Ordering
>>> >
>>> > The second aspect is represented in Calcite by "collation" trait,
>>> which can
>>> > make a radical difference in logical planning. Replacing the original
>>> table
>>> > with one of its indices might end up changing the whole plan
>>> completely.
>>> >
>>> > I am not sure yet which stage the Phoenix materialization will
>>> eventually
>>> > go, but one certain thing is that it should be available for all the
>>> > general optimizations to take effect.
>>> >
>>> >
>>> > Thanks,
>>> > Maryann
>>> >
>>> > On Wed, Oct 14, 2015 at 12:55 PM, Aman Sinha <asinha@maprtech.com>
>>> wrote:
>>> >
>>> >> Catching up on this thread.  Jacques, if I understand correctly,  you
>>> are
>>> >> proposing that instead of the single point of initialization of rules
>>> when
>>> >> we instantiate FrameworkConfig (in DrillSqlWorker), we would have more
>>> >> entry points to plug into different phases of planning and storage
>>> plugins
>>> >> would register different sets of rules in these separate phases.   It
>>> seems
>>> >> fine to me (assuming that there are no side effects where we somehow
>>> end up
>>> >> increasing the search space for the existing plans).
>>> >>
>>> >> When talking about the Phoenix integration or the JDBC storage
>>> plugin, I
>>> >> am curious about which phase(s) would they register the rules for ?
 I
>>> >> believe Phoenix's materialized view usage in Calcite is actually for
>>> >> secondary indexing, not for materialized views per se.  In that
>>> sense, it
>>> >> would seem to fit into physical planning phase rather than logical,
>>> since
>>> >> indexes are a faster physical access mechanism for a scan.  The
>>> logical
>>> >> properties of a table don't change due to presence of an index.
>>> >>
>>> >> On the other hand, I think the JDBC plugin might register rules for
>>> >> logical phase since  it would have filter and projection pushdowns
>>> that do
>>> >> change logical properties.
>>> >>
>>> >> Aman
>>> >>
>>> >>
>>> >> On Mon, Oct 12, 2015 at 5:36 PM, Hanifi Gunes <hgunes@maprtech.com>
>>> wrote:
>>> >>
>>> >>> I would +1 (1-3) for sure. I do not have much understanding of
>>> programs
>>> >>> however additional flexibility for storage plugin devs sounds cool
in
>>> >>> general when used responsibly =) so +0 for (4)
>>> >>>
>>> >>>
>>> >>> -H+
>>> >>>
>>> >>> On Mon, Oct 12, 2015 at 4:12 PM, Jacques Nadeau <jacques@dremio.com>
>>> >>> wrote:
>>> >>>
>>> >>> > The dead air must mean that everyone is onboard with my
>>> recommendation
>>> >>> >
>>> >>> > PlannerIntegration StoragePlugin.getPlannerIntegrations()
>>> >>> >
>>> >>> > interface PlannerIntegration{
>>> >>> >   void initialize(Planner, Phase)
>>> >>> > }
>>> >>> >
>>> >>> > Right :D
>>> >>> >
>>> >>> > --
>>> >>> > Jacques Nadeau
>>> >>> > CTO and Co-Founder, Dremio
>>> >>> >
>>> >>> > On Fri, Oct 9, 2015 at 7:03 AM, Jacques Nadeau <jacques@dremio.com
>>> >
>>> >>> wrote:
>>> >>> >
>>> >>> > > A number of us were meeting last week to work through
>>> integrating the
>>> >>> > > Phoenix storage plugin. This plugin is interesting because
it
>>> also
>>> >>> uses
>>> >>> > > Calcite for planning. In some ways, this should make integration
>>> easy.
>>> >>> > > However, it also allowed us to see certain constraints
who how we
>>> >>> expose
>>> >>> > > planner integration between storage plugins and Drill
internals.
>>> >>> > > Currently, Drill asks the plugin to provide a set of optimizer
>>> rules
>>> >>> > which
>>> >>> > > it incorporates into one of the many stages of planning.
This is
>>> too
>>> >>> > > constraining in two ways:
>>> >>> > >
>>> >>> > > 1. it doesn't allow a plugin to decide which phase of
planning to
>>> >>> > > integrate with. (This was definitely a problem in the
Phoenix
>>> case.
>>> >>> Our
>>> >>> > > hack solution for now is to incorporate storage plugin
rules in
>>> phases
>>> >>> > > instead of just one [1].)
>>> >>> > > 2. it doesn't allow arbitrary transformations. Calcite
provides a
>>> >>> program
>>> >>> > > concept. It may be that a plugin needs to do some of its
own work
>>> >>> using
>>> >>> > the
>>> >>> > > Hep planner. Currently there isn't an elegant way to do
this in
>>> the
>>> >>> > context
>>> >>> > > of the rule.
>>> >>> > > 3. There is no easy way to incorporate additional planner
>>> >>> initialization
>>> >>> > > options. This was almost a problem in the case of the
JDBC
>>> plugin. It
>>> >>> > > turned out that a hidden integration using register()
here [2]
>>> >>> allowed us
>>> >>> > > to continue throughout the planning phases. However, we
have to
>>> >>> register
>>> >>> > > all the rules for all the phases of planning which is
a bit
>>> unclean.
>>> >>> > We're
>>> >>> > > hitting the same problem in the case of Phoenix where
we need to
>>> >>> register
>>> >>> > > materialized views as part of planner initialization but
the
>>> hack from
>>> >>> > the
>>> >>> > > JDBC case won't really work.
>>> >>> > >
>>> >>> > > I suggest we update the interface to allow better support
for
>>> these
>>> >>> types
>>> >>> > > of integrations.
>>> >>> > >
>>> >>> > > These seem to be the main requirements:
>>> >>> > > 1. Expose concrete planning phases to storage plugins
>>> >>> > > 2. Allow a storage plugin to provide additional planner
>>> initialization
>>> >>> > > behavior
>>> >>> > > 3. Allow a storage plugin to provide rules to include
a
>>> particular
>>> >>> > > planning phase (merged with other rules during that phase).
>>> >>> > > 4. (possibly) allow a storage plugin to provide transformation
>>> >>> programs
>>> >>> > > that are to be executed in between the concrete planning
phases.
>>> >>> > >
>>> >>> > > Item (4) above is the most questionable to me as I wonder
>>> whether or
>>> >>> not
>>> >>> > > this could simply be solved by creating a transformation
rule (or
>>> >>> program
>>> >>> > > rule in Calcite's terminology) that creates an alternative
tree
>>> and
>>> >>> thus
>>> >>> > be
>>> >>> > > solved by (3).
>>> >>> > >
>>> >>> > > A simple solution might be (if we ignore #4):
>>> >>> > >
>>> >>> > > PlannerIntegration StoragePlugin.getPlannerIntegrations()
>>> >>> > >
>>> >>> > > interface PlannerIntegration{
>>> >>> > >   void initialize(Planner, Phase)
>>> >>> > > }
>>> >>> > >
>>> >>> > > This way, a storage plugin could register rules (or materialized
>>> >>> views)
>>> >>> > at
>>> >>> > > setup time.
>>> >>> > >
>>> >>> > > What do others think?
>>> >>> > >
>>> >>> > > [1]
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L145
>>> >>> > > [2]
>>> >>> > >
>>> >>> >
>>> >>>
>>> https://github.com/jacques-n/drill/commit/d463f9098ef63b9a2844206950334cb16fc00327#diff-e67ba82ec2fbb8bc15eed30ec6a5379cR119
>>> >>> > >
>>> >>> > > --
>>> >>> > > Jacques Nadeau
>>> >>> > > CTO and Co-Founder, Dremio
>>> >>> > >
>>> >>> >
>>> >>>
>>> >>
>>> >>
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message