calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: What is the best way to determine input to a join is a relation
Date Wed, 06 Jan 2016 18:31:40 GMT
I’m going to rephrase your question to “What is the best way to determine whether a relation
is a table?” I like to use the term “relation” to cover both finite relations (tables)
and infinite relations (streams).

I think a useful trait would be “timeliness of sort”. It sounds very abstract but bear
with me.

Suppose I have a relation R and I want to do a stream join on its rowtime column. I might
ask the following question:

  What is the maximum amount that any row might be delayed when I execute ‘select * from
R order by rowtime’, waiting for all applicable rows?

Some cases:
* If R is a table (i.e. a relation that does not have new data arriving during the execution
of the query), we have all the data already, so the delay is zero. If R is a stream sorted
by rowtime, the delay is zero (or perhaps a small value t that represents the network latency).
* If R is a stream based on a log files, and the rowtime column is based on the wallclock
time of those servers, and the log files are pushed every hour, then the delay is 1 hour.
* If R is a stream sorted by some other column, then the delay is infinite.

Measuring the sort-delay is more general than saying whether a relation is sorted. For a stream,
the sort-delay is zero. If the sort-delay is infinity, or too high, we don’t consider it
to be a viable plan.

As I said, this is a very abstract concept. I’ve not fully thought through the idea, and
it doesn’t directly answer your question, but I think we can develop it to achieve your
goal, which is to optimize stream-table joins. Do you think that it is a useful concept worth
developing?

Julian


> On Jan 6, 2016, at 9:55 AM, Jacques Nadeau <jacques@apache.org> wrote:
> 
> It seems like it should be a trait. The one problem you'll hit though is
> how to propagate that trait. Maybe Julian has some good ideas.
> On Jan 6, 2016 9:02 AM, "Milinda Pathirage" <mpathira@umail.iu.edu> wrote:
> 
>> Hi Devs,
>> 
>> I need to figure out which input is the relation in a stream-to-relation
>> join. Is there a good way to do this rather than traversing the inputs
>> until a scan operator is found.
>> 
>> Thanks
>> Milinda
>> 
>> --
>> Milinda Pathirage
>> 
>> PhD Student | Research Assistant
>> School of Informatics and Computing | Data to Insight Center
>> Indiana University
>> 
>> twitter: milindalakmal
>> skype: milinda.pathirage
>> blog: http://milinda.pathirage.org
>> 


Mime
View raw message