samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <nickpa...@gmail.com>
Subject Re: Stream SQL Query Planner Update
Date Mon, 06 Apr 2015 18:05:45 GMT
Hi, Milinda,

Great! Thanks for making the excellent progress in this! I will try to
follow up with the patch today.

Thanks!

-Yi

On Mon, Apr 6, 2015 at 11:00 AM, Milinda Pathirage <mpathira@umail.iu.edu>
wrote:

> Hi All,
>
> I have attached a patch to SAMZA-561 (
> https://issues.apache.org/jira/browse/SAMZA-561)  which demonstrate
> streaming SQL execution planning functionality.  Attached patch only
> supports stream filtering and ‘insert into’ for sending the filtered stream
> to a some other topic. This patch also comes with a integration test which
> demonstrate and validate the stream filtering capability.  Below are some
> facts about the current implementation.
>
> - Streaming SQL is packaged into org.apache.samza.task.sql.StreamSqlTask
> which is a Samza  StreamTask.
> - Stream SQL query, Avro schema (used to initialize SQL serde for Avro) and
> Calcite model are configured as StreamSqlTask properties. (Please refer
> SAMZA_SRC/samza-test/src/main/config/sql-filter.properties)
> - Avro schema and Calcite model is available in the class path and special
> URL format (e.g. resource:orders.avsc) is used to refer to them.  (Note:
> This should be changed.)
> - Whole query get executed inside the same Samza task
> - Projections are not there yet.
>
> My next goal is to implement basic aggregations and projections.
> Projections can change the schema of the final output. We need to discuss
> whether we need user to specify the output schema or we handle (track) the
> changes to the schema in the query layer (still not sure whether this is
> possible or not). According to my current understanding about Calcite, its
> possible to get type of a row after a projection or any other operation. I
> believe generating the output schema based on this row type is possible. I
> will provide more details once I started to working on projections.
>
> Also, please feel free to comment about how I have implemented the
> StreamSqlTask. I would like to know whether this approach to configuring
> and scheduling streaming queries is okay. Also any other comments on
> improvements are welcome.
>
> Thanks
> Milinda
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message