calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuzhao Chen <yuzhao....@gmail.com>
Subject Re: [DISCUSSION] Extension of Metadata Query
Date Thu, 06 Jun 2019 01:23:35 GMT
In order to add some top-level query interfaces like columnInterval to topK, one way to extend
the RelMetadataQuery is “sub-class” it,  but a subclass-instance of RelMetadataQuery
can not be set into the RelOptCluster(with a always default singleton RMQ instance), which
is a key factor to limit the extension of RelMetadataQuery.

Best,
Danny Chan
在 2019年6月5日 +0800 PM11:49,Julian Hyde <jhyde.apache@gmail.com>,写道:
> By “extend” do you mean “sub-class”?
>
> > On Jun 4, 2019, at 10:19 PM, Yuzhao Chen <yuzhao.cyz@gmail.com> wrote:
> >
> > Thanks Julian for your detail reply,
> >
> >
> > 1. I checked the Janino gened-code and the RelMetadataQuery/RelMetadataProvidor
and almost can make sure MetadataQuery only use the RelMetadataProvidor#handlers method for
multiple dispatch, the RelMetadataProvidor#apply is only used for MetadataFactory.
> > 2. I agree that we should provide new RelMetadataProvider for extension, but the
RelMetadataQuery has top level interfaces for metadata query, these top level interfaces should
be extendable and sync with our metadata handlers.
> >
> >
> > Best,
> > Danny Chan
> > 在 2019年6月5日 +0800 AM1:48,Julian Hyde <jhyde@apache.org>,写道:
> > > > 1. Why we have 2 methods in RelMetadataProvider?
> > >
> > > The metadata system is complicated. We need to allow multiple handlers
> > > for any given call. So, making a metadata call involves multiple
> > > dispatch [1] based on the metadata method being called, the type of
> > > RelNode, and the handlers that are registered. Also it needs to cache
> > > results, and detect cycles. And the dispatch needs to be efficient, so
> > > we generate janino code to do the dispatch, and re-generate when new
> > > handlers or sub-classes of RelNode are added.
> > >
> > > I forget details, the two methods are basically required to allow us
> > > to generate code to do the dispatch.
> > >
> > > > 2. We should make the RelMetadataQuery in RelOptCluster pluggable.
> > >
> > > I disagree. RelMetadataQuery is not an extension point. Its sole
> > > purpose is to to keep state (the cache and cycle-checking).
> > > RelMetadataProvider is the extension point. (By analogy, if you are
> > > un-parsing an AST, you let each AST sub-class handle unparsing itself,
> > > but the unparsed text goes to a simple StringBuilder. RelMetadataQuery
> > > is in the role of the StringBuilder. In a complex system, it is nice
> > > to keep some of the components simple, or at least keep them to
> > > prescribed roles.)
> > >
> > > Julian
> > >
> > > [1] https://en.wikipedia.org/wiki/Multiple_dispatch
> > >
> > > > On Sun, Jun 2, 2019 at 11:19 PM Yuzhao Chen <yuzhao.cyz@gmail.com>
wrote:
> > > >
> > > > Currently we provide answer to metadata query through RelMetadataProvider
[1], there are some sub-classes of it:
> > > >
> > > > RelMetadataProvider
> > > > |
> > > > |- VolcanoRelMetadataProvider
> > > > |- ChainedRelMetadataProvider/DefaultRelMetadataProvider
> > > > |- HepRelMetadataProvider
> > > > |- CachingRelMetadataProvider
> > > > |- ReflectiveRelMetadataProvider
> > > > |- JaninoRelMetadataProvider
> > > >
> > > > The RelMetadataProvider has two methods: #apply and #handlers, the #apply
method seems a programming interface and there is a demo code how we can use it:
> > > >
> > > > RelMetadataProvider provider;
> > > > LogicalFilter filter;
> > > > RexNode predicate;
> > > > Function<RelNode, Metadata> function =
> > > > provider.apply(LogicalFilter.class, Selectivity.class};
> > > > Selectivity selectivity = function.apply(filter);
> > > > Double d = selectivity.selectivity(predicate);
> > > >
> > > > But let's see our RelOptCluster's member variables[2], there are MetadataFactory
and RelMetadataQuery which all can be used to query the metadata, for MetadataFactory, there
is a default impl named MetadataFactoryImpl which will invoke RelMetadataProvider#apply internally,
for RelMetadataQuery, it will invoke RelMetadataProvider#handlers (finally composed and codeden
by JaninoRelMetadataProvider).
> > > >
> > > > In our planning phrase, we can invoke RelOptRuleCall#getMetadataQuery
to get the MQ and query the metadata.
> > > >
> > > > For extension of metadata handlers, we can set our customized RelMetadataProvider
in RelOptCluster[3]. But for RelMetadataQuery, we have no way to extend it now, because the
RelOptCluster always has a singleton instance [4] which is only the default implementation.
> > > >
> > > >
> > > > My question is as follows:
> > > >
> > > > 1. Why we have 2 methods in RelMetadataProvider, and why we need the MetadataFactory
and RelMetadataProvider#apply ? It seems that it's function is already been overriden by RelMetadataQuery(The
difference is that MetadataFactory use Reflection and RelMetadataQuery use gened bytes code).
> > > > 2. We should make the RelMetadataQuery in RelOptCluster pluggable.
> > > >
> > > >
> > > > [1] https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataProvider.java#L38
> > > > [2] https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L49
> > > > [3] https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L135
> > > > [4] https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L151
> > > >
> > > >
> > > >
> > > > Best,
> > > > Danny Chan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message