spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Armbrust (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1363) Add streaming support for Spark SQL module
Date Tue, 01 Apr 2014 00:27:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955936#comment-13955936
] 

Michael Armbrust commented on SPARK-1363:
-----------------------------------------

Thanks for publishing the code!  Would it be possible to recreate the repository as a fork
of apache/spark?  That would make it much easier to diff the branches and eventually make
a pull request.

> Add streaming support for Spark SQL module
> ------------------------------------------
>
>                 Key: SPARK-1363
>                 URL: https://issues.apache.org/jira/browse/SPARK-1363
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Saisai Shao
>         Attachments: StreamSQLDesignDoc.pdf
>
>
> Currently there exists some projects like Pig On Storm, SQL on storm (Squall, SQLstream)
that can query over streaming data, but for Spark Streaming, it is a blank area. It will be
a good feature to add streaming supported SQL to Spark SQL.
> From semantic perspective, DStream is quite alike RDD, they both have join, filter, groupBy
operators and so on, also DStream is backed by RDD, so it is transplant-able and reusable
from existing spark plan.
> Also Catalyst has a clear division for each step, we can fully use its parse and logical
plan analysis steps,  with only different physical plan.
> So here we propose to add streaming support in Catalyst.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message