calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Re-working the metadata framework
Date Tue, 19 Jan 2016 05:31:05 GMT
I read your email.  My thought was that we should evaluate the real
requirements. Maybe we need to shed some capabilities to achieve better
performance. Can you break down the issue into a set of example patterns?
In general, it seems like we may be trying to be too flexible. I understand
the desire to maintain as much backwards compatibility as possible but
think we may want to come up with a simpler metaphor. My goal is to come up
with the right design rather than trying to optimize what might be the
wrong design. Maybe you've done all the thinking and believe the existing
metaphors are exactly right but I think we should get consensus around this
the requirements before jumping to implementation approaches such as code
generation.
On Jan 18, 2016 5:40 PM, "Julian Hyde" <jhyde@apache.org> wrote:

> I describe in paragraph 3 why we CURRENTLY use reflection. In short: very
> complex dispatch requirements. The rest of the message is how I plan to
> phase it out. Plain java only has one dispatch mechanism (virtual methods)
> so isn’t going to cut it.
>
> > On Jan 18, 2016, at 5:23 PM, Jacques Nadeau <jacques@apache.org> wrote:
> >
> > Can you go into more detail to why reflection is needed? It seems like we
> > could get away from reflection by sharing interfaces, etc.
> >
> > On Mon, Jan 18, 2016 at 5:12 PM, Julian Hyde <jhyde@apache.org> wrote:
> >
> >> In https://issues.apache.org/jira/browse/CALCITE-794 <
> >> https://issues.apache.org/jira/browse/CALCITE-794> we added an extra
> >> parameter to each metadata call so that we could detect cyclic metadata
> >> calls, and potentially to cache results so that a given statistic is
> never
> >> computed more than once during a metadata call. But the overhead of
> making
> >> calls into the metadata framework is still very high. It shows up as a
> big
> >> fraction of the time spent optimizing complex queries. I am working on
> >> https://issues.apache.org/jira/browse/CALCITE-604 <
> >> https://issues.apache.org/jira/browse/CALCITE-604>, which aims to fix
> >> that.
> >>
> >> I am working on 604 while the release is closing, and I thought some of
> >> you would be be interested to know where I am going.
> >>
> >> We use reflection to make calls. This is necessary because the types of
> >> metadata (e.g. selectivity, row count, unique keys, predicates) are
> >> extensible, you can have multiple providers for each kind of metadata,
> each
> >> provider has different methods for various RelNode sub-types, and we
> want
> >> to be able to inherit handler methods (e.g. getUniqueKeys(Aggregate,
> >> boolean) handles getUniqueKeys(LogicalAggregate, boolean) because there
> is
> >> no handler method for LogicalAggregate.
> >>
> >> Initially I thought we’d use MethodHandle, which is a lot faster than
> >> method invocation by reflection. MethodHandle.invoke has some
> flexibility
> >> based on the types of its arguments, but I realized we’d still have to
> >> dispatch to multiple underlying providers (e.g. the built-in provider
> and
> >> the Hive provider). And we have other inefficiencies such as calling
> >> UnboundMetadata.bind(RelNode, RelMetadataQuery) to create a short-lived
> >> object every single call.
> >>
> >> So, now I am looking at using Janino to generate a dispatcher. Consider
> >> just one kind of metadata, UniqueKeys. We already have a “signature”
> >> interface:
> >>
> >> public interface UniqueKeys extends Metadata {
> >>  Set<ImmutableBitSet> getUniqueKeys(boolean ignoreNulls);
> >> }
> >>
> >> I have added a handler interface:
> >>
> >> interface UniqueKeysHandler {
> >>  Set<ImmutableBitSet> getUniqueKeys(RelNode r, RelMetadataQuery mq,
> >> boolean ignoreNulls);
> >> }
> >>
> >> Now, given a set of metadata providers and the set of all known RelNode
> >> sub-type, I can use Janino to generate a handler at run time:
> >>
> >> class UniqueKeysHandlerImpl implements UniqueKeysHandlerImpl {
> >>  final RelMdUniqueKeys provider0;
> >>  final HiveUniqueKeys provider1;
> >>
> >>  UniqueKeysHandlerImpl(RelMdUniqueKeys provider0, HiveUniqueKeys
> >> provider1) {
> >>    this.provider0 = provider0;
> >>    this.provider1 = provider1;
> >>  }
> >>
> >>  public Set<ImmutableBitSet> getUniqueKeys(RelNode r,
> >>      RelMetadataQuery mq, boolean ignoreNulls) {
> >>    switch (r.getClass().getName()) {
> >>    case "org.apache.calcite.rel.logical.LogicalAggregate":
> >>      return provider0.getUniqueKeys((Aggregate) r, mq, ignoreNulls);
> >>    case "org.apache.calcite.rel.core.Aggregate":
> >>      return provider0.getUniqueKeys(r, mq, ignoreNulls);
> >>    case “
> >> org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveAggregate":
> >>      return provider1.getUniqueKeys(r, mq, ignoreNulls);
> >>    default:
> >>      throw NoHandler.INSTANCE;
> >>    }
> >>  }
> >> }
> >>
> >> The entry point in RelMetadataQuery changes from
> >>
> >> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
> >>    boolean ignoreNulls) {
> >>  final BuiltInMetadata.UniqueKeys metadata =
> >>      rel.metadata(BuiltInMetadata.UniqueKeys.class, this);
> >>  return metadata.getUniqueKeys(ignoreNulls);
> >> }
> >>
> >> to
> >>
> >> public Set<ImmutableBitSet> getUniqueKeys(RelNode rel,
> >>    boolean ignoreNulls) {
> >>  for (;;) {
> >>    try {
> >>      return uniqueKeysHandler.getUniqueKeys(rel, this, ignoreNulls);
> >>    } catch (NoHandler e) {
> >>      uniqueKeysHandler = metadataProvider.revise(rel.getClass(),
> >>          BuiltInMetadata.UniqueKeys.Handler.class);
> >>    }
> >>  }
> >> }
> >>
> >> The “NoHandler” exception occurs very rarely — only when a kind of
> RelNode
> >> is seen that hasn’t been seen before in this JVM instance — but gives
> the
> >> handler chance to regenerate itself.
> >>
> >> The result is a very direct path from the caller (generally a
> RelOptRule)
> >> to the provider: two calls, and we don’t even need to box the arguments.
> >>
> >> I don’t think there will be any API changes, but note that the metadata
> >> interfaces (eg. UniqueKeys) and RelNode.metadata(Class<M> metadataClass,
> >> RelMetadataQuery mq) are not used anymore.
> >>
> >> Julian
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message