What you call “statistics” Calcite calls “metadata”. Calcite has a comprehensive system
for adding a new kind of metadata (such as histograms) or a new provider for metadata (that
would, say, compute a value of the Selectivity metadata for YourFilter and YourJoin).
The Table.getStatistic() method is a very simple way to inject some very simple metadata,
but it does not (and is not intended to) scale to richer metadata.
Take a look at BuiltInMetadata, RelMetadataQuery, and one of the built-in providers, say RelMdSelectivity.
Note that it is OK to define your own metadata types outside of BuiltInMetadata. RelMetadataTest.ColType
illustrates that this is possible.
Other groups (Hive, Drill) are probably interested in a “Histogram” metadata type, and
it would be great if we could all use the same definition of Histogram, but I suspect it would
take several months for that discussion to converge on anything concrete. If you’re in a
hurry, better to forge ahead and share what you come up with.
Julian
> On Feb 24, 2016, at 6:02 AM, Victor Giannakouris - Salalidis <victorasgs@gmail.com>
wrote:
>
> Hello,
>
> I am using HepPlanner with custom table classes for the catalog (extending
> *AbstractTable*). In my implementation I override the getStatistic() method
> in which I return a Statistic definition in which I override the
> getRowCount() method.
>
> I added some rules to the planner in order to optimize join ordering. At
> this step, it moves for example the smaller tables (such as those in which
> a filter is applied) at the left (*build side*).
>
> My actual question is how (where) can I add my own statistics (concretely,
> *histograms* for selectivity estimation) in order to perform estimates for
> filters or join intermediate results.
> --
> Victor Giannakouris - Salalidis
>
> LinkedIn:
> http://gr.linkedin.com/pub/victor-giannakouris-salalidis/69/585/b23/
> Personal Page: http://gsvic.github.io
|