spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saisai Shao (JIRA)" <>
Subject [jira] [Commented] (SPARK-1363) Add streaming support for Spark SQL module
Date Tue, 01 Apr 2014 04:15:31 GMT


Saisai Shao commented on SPARK-1363:

Hi Michael, thanks for your advice, I've recreated the repo as a fork of apache/spark. It
is greatly helpful if you can give me some comments. Thanks a lot.

> Add streaming support for Spark SQL module
> ------------------------------------------
>                 Key: SPARK-1363
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Saisai Shao
>         Attachments: StreamSQLDesignDoc.pdf
> Currently there exists some projects like Pig On Storm, SQL on storm (Squall, SQLstream)
that can query over streaming data, but for Spark Streaming, it is a blank area. It will be
a good feature to add streaming supported SQL to Spark SQL.
> From semantic perspective, DStream is quite alike RDD, they both have join, filter, groupBy
operators and so on, also DStream is backed by RDD, so it is transplant-able and reusable
from existing spark plan.
> Also Catalyst has a clear division for each step, we can fully use its parse and logical
plan analysis steps,  with only different physical plan.
> So here we propose to add streaming support in Catalyst.

This message was sent by Atlassian JIRA

View raw message