calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jhyde.apa...@gmail.com>
Subject Re: Execute multiple RelNodes from single RelNode
Date Wed, 05 Jun 2019 14:14:32 GMT
Multi-sink is something I’ve wanted to for a while. (I know that hive uses multi-sink plans
for insert, but has never been able to model them using Calcite.)

We basically need a DAG. The problems are how to model divergent data flows, and how to model
the “controller” that waits for all of the sinks to finish. 

The Spool operator might be a good way to model the fact that there are multiple consumers
of the source scan. 

As for the controller: how about a Union, say “select count(*) from sink1 union all select
count(*) from sink2”. (Strictly, you don’t need to count, but you need to wait until each
sink has completed, and you need the row-types to be union-compatible, so Union is pretty
good.)

I look forward to seeing some optimization rules on Spool. E.g. project away columns that
none of the consumers need, similarly filters. 

Julian

> On Jun 5, 2019, at 5:12 AM, Yuzhao Chen <yuzhao.cyz@gmail.com> wrote:
> 
> This seems a requests for multi-sink insert.
> 
>> 3) Calcite transforms it into multiple TableModifies
> 
> Instead of let Calcite to transform multiple TableModifies, I think you should do it
by your self, the send each TableModify to Calcite sqlToRel converter.
> 
> If you want to insert into multiple sink task to be run in the same plan, this is another
topic, we may promote one sink node tree a time and finally merge all the trees.
> 
> Best,
> Danny Chan
> 在 2019年6月5日 +0800 PM7:58,dev@calcite.apache.org,写道:
>> 
>> 3) Calcite transforms it into multiple TableModifies

Mime
View raw message