spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Armbrust (JIRA)" <>
Subject [jira] [Commented] (SPARK-1363) Add streaming support for Spark SQL module
Date Tue, 01 Apr 2014 00:27:20 GMT


Michael Armbrust commented on SPARK-1363:

Thanks for publishing the code!  Would it be possible to recreate the repository as a fork
of apache/spark?  That would make it much easier to diff the branches and eventually make
a pull request.

> Add streaming support for Spark SQL module
> ------------------------------------------
>                 Key: SPARK-1363
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Saisai Shao
>         Attachments: StreamSQLDesignDoc.pdf
> Currently there exists some projects like Pig On Storm, SQL on storm (Squall, SQLstream)
that can query over streaming data, but for Spark Streaming, it is a blank area. It will be
a good feature to add streaming supported SQL to Spark SQL.
> From semantic perspective, DStream is quite alike RDD, they both have join, filter, groupBy
operators and so on, also DStream is backed by RDD, so it is transplant-able and reusable
from existing spark plan.
> Also Catalyst has a clear division for each step, we can fully use its parse and logical
plan analysis steps,  with only different physical plan.
> So here we propose to add streaming support in Catalyst.

This message was sent by Atlassian JIRA

View raw message