calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chunwei Lei <chunwei.l...@gmail.com>
Subject Re: [DISCUSSION] Extension of Metadata Query
Date Wed, 05 Jun 2019 04:17:20 GMT
Thanks for raising this, Danny.  Actually I have the same question too.

> RelMetadataQuery is not an extension point. Its sole purpose is to to
keep state (the cache and cycle-checking)
I think users may extend the RelMetadataQuery if wanting to query more
metadata, such as TopK values.


Best,
Chunwei


On Wed, Jun 5, 2019 at 6:44 AM Stamatis Zampetakis <zabetak@gmail.com>
wrote:

> Thanks for bringing this up Danny.
>
> I guess the discussion came up due to CALCITE-2885 [1]. Looking back into
> it, I am not sure that the intention there is to make the RelMetadataQuery
> pluggable. We could possibly solve the performance problem without
> extending the RelMetadataQuery. We have to look again into this case.
>
> For more details regarding the existence of the two methods in
> RelMetadataProvider have a look in CALCITE-604 [2]. More general for the
> design of RelMetadataProvider you may find useful the description in [3].
>
> Best,
> Stamatis
>
> [1] https://issues.apache.org/jira/browse/CALCITE-2885
> [2] https://issues.apache.org/jira/browse/CALCITE-604
> [3]
>
> https://web.archive.org/web/20140624040836/www.hydromatic.net/wiki/RelationalExpressionMetadata/
>
> On Tue, Jun 4, 2019 at 7:48 PM Julian Hyde <jhyde@apache.org> wrote:
>
> > > 1. Why we have 2 methods in RelMetadataProvider?
> >
> > The metadata system is complicated. We need to allow multiple handlers
> > for any given call. So, making a metadata call involves multiple
> > dispatch [1] based on the metadata method being called, the type of
> > RelNode, and the handlers that are registered. Also it needs to cache
> > results, and detect cycles. And the dispatch needs to be efficient, so
> > we generate janino code to do the dispatch, and re-generate when new
> > handlers or sub-classes of RelNode are added.
> >
> > I forget details, the two methods are basically required to allow us
> > to generate code to do the dispatch.
> >
> > > 2. We should make the RelMetadataQuery in RelOptCluster pluggable.
> >
> > I disagree. RelMetadataQuery is not an extension point. Its sole
> > purpose is to to keep state (the cache and cycle-checking).
> > RelMetadataProvider is the extension point. (By analogy, if you are
> > un-parsing an AST, you let each AST sub-class handle unparsing itself,
> > but the unparsed text goes to a simple StringBuilder. RelMetadataQuery
> > is in the role of the StringBuilder. In a complex system, it is nice
> > to keep some of the components simple, or at least keep them to
> > prescribed roles.)
> >
> > Julian
> >
> > [1] https://en.wikipedia.org/wiki/Multiple_dispatch
> >
> > On Sun, Jun 2, 2019 at 11:19 PM Yuzhao Chen <yuzhao.cyz@gmail.com>
> wrote:
> > >
> > > Currently we provide answer to metadata query through
> > RelMetadataProvider [1], there are some sub-classes of it:
> > >
> > > RelMetadataProvider
> > > |
> > > |- VolcanoRelMetadataProvider
> > > |- ChainedRelMetadataProvider/DefaultRelMetadataProvider
> > > |- HepRelMetadataProvider
> > > |- CachingRelMetadataProvider
> > > |- ReflectiveRelMetadataProvider
> > > |- JaninoRelMetadataProvider
> > >
> > > The RelMetadataProvider has two methods: #apply and #handlers, the
> > #apply method seems a programming interface and there is a demo code how
> we
> > can use it:
> > >
> > > RelMetadataProvider provider;
> > > LogicalFilter filter;
> > > RexNode predicate;
> > > Function<RelNode, Metadata> function =
> > > provider.apply(LogicalFilter.class, Selectivity.class};
> > > Selectivity selectivity = function.apply(filter);
> > > Double d = selectivity.selectivity(predicate);
> > >
> > > But let's see our RelOptCluster's member variables[2], there are
> > MetadataFactory and RelMetadataQuery which all can be used to query the
> > metadata, for MetadataFactory, there is a default impl named
> > MetadataFactoryImpl which will invoke RelMetadataProvider#apply
> internally,
> > for RelMetadataQuery, it will invoke RelMetadataProvider#handlers
> (finally
> > composed and codeden by JaninoRelMetadataProvider).
> > >
> > > In our planning phrase, we can invoke RelOptRuleCall#getMetadataQuery
> to
> > get the MQ and query the metadata.
> > >
> > > For extension of metadata handlers, we can set our customized
> > RelMetadataProvider in RelOptCluster[3]. But for RelMetadataQuery, we
> have
> > no way to extend it now, because the RelOptCluster always has a singleton
> > instance [4] which is only the default implementation.
> > >
> > >
> > > My question is as follows:
> > >
> > > 1. Why we have 2 methods in RelMetadataProvider, and why we need the
> > MetadataFactory and RelMetadataProvider#apply ? It seems that it's
> function
> > is already been overriden by RelMetadataQuery(The difference is that
> > MetadataFactory use Reflection and RelMetadataQuery use gened bytes
> code).
> > > 2. We should make the RelMetadataQuery in RelOptCluster pluggable.
> > >
> > >
> > > [1]
> >
> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataProvider.java#L38
> > > [2]
> >
> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L49
> > > [3]
> >
> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L135
> > > [4]
> >
> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L151
> > >
> > >
> > >
> > > Best,
> > > Danny Chan
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message