calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Add Custom Statistics
Date Mon, 29 Feb 2016 18:53:22 GMT
As soon as you have something to share — even if it’s not ready to commit — I’d be
happy to review.

> On Feb 28, 2016, at 11:05 PM, Victor Giannakouris - Salalidis <victorasgs@gmail.com>
wrote:
> 
> Hi Julian,
> 
> Thank you for the quick response. These days I am working on the classes
> you mention at your reply. If I come up with some different implementation
> I will share it for sure.
> 
> Best,
> 
> Victor
> 
> On Wed, Feb 24, 2016 at 11:24 PM, Julian Hyde <jhyde@apache.org> wrote:
> 
>> What you call “statistics” Calcite calls “metadata”. Calcite has a
>> comprehensive system for adding a new kind of metadata (such as histograms)
>> or a new provider for metadata (that would, say, compute a value of the
>> Selectivity metadata for YourFilter and YourJoin).
>> 
>> The Table.getStatistic() method is a very simple way to inject some very
>> simple metadata, but it does not (and is not intended to) scale to richer
>> metadata.
>> 
>> Take a look at BuiltInMetadata, RelMetadataQuery, and one of the built-in
>> providers, say RelMdSelectivity.
>> 
>> Note that it is OK to define your own metadata types outside of
>> BuiltInMetadata. RelMetadataTest.ColType illustrates that this is possible.
>> 
>> Other groups (Hive, Drill) are probably interested in a “Histogram”
>> metadata type, and it would be great if we could all use the same
>> definition of Histogram, but I suspect it would take several months for
>> that discussion to converge on anything concrete. If you’re in a hurry,
>> better to forge ahead and share what you come up with.
>> 
>> Julian
>> 
>> 
>> 
>>> On Feb 24, 2016, at 6:02 AM, Victor Giannakouris - Salalidis <
>> victorasgs@gmail.com> wrote:
>>> 
>>> Hello,
>>> 
>>> I am using HepPlanner with custom table classes for the catalog
>> (extending
>>> *AbstractTable*). In my implementation I override the getStatistic()
>> method
>>> in which I return a Statistic definition in which I override the
>>> getRowCount() method.
>>> 
>>> I added some rules to the planner in order to optimize join ordering. At
>>> this step, it moves for example the smaller tables (such as those in
>> which
>>> a filter is applied) at the left (*build side*).
>>> 
>>> My actual question is how (where) can I add my own statistics
>> (concretely,
>>> *histograms* for selectivity estimation) in order to perform estimates
>> for
>>> filters or join intermediate results.
>>> --
>>> Victor Giannakouris - Salalidis
>>> 
>>> LinkedIn:
>>> http://gr.linkedin.com/pub/victor-giannakouris-salalidis/69/585/b23/
>>> Personal Page: http://gsvic.github.io
>> 


Mime
View raw message