@Tobias,

 

According to my understanding, your approach is to register a series of tables by using transformWith, right? And then, you can get a new Dstream (i.e., SchemaDstream), which consists of lots of SchemaRDDs.

 

Please correct me if my understanding is wrong.  

 

Thank you && Best Regards,

Grace Huang Jie)

 

From: Jason Dai [mailto:jason.dai@gmail.com]
Sent: Wednesday, March 11, 2015 10:45 PM
To: Irfan Ahmad
Cc: Tobias Pfeiffer; Cheng, Hao; Mohit Anchlia; user@spark.apache.org; Shao, Saisai; Dai, Jason; Huang, Jie
Subject: Re: SQL with Spark Streaming

 

Sorry typo; should be https://github.com/intel-spark/stream-sql

 

Thanks,

-Jason

 

On Wed, Mar 11, 2015 at 10:19 PM, Irfan Ahmad <irfan@cloudphysics.com> wrote:

Got a 404 on that link: https://github.com/Intel-bigdata/spark-streamsql


 

Irfan Ahmad

CTO | Co-Founder | CloudPhysics

Best of VMworld Finalist

Best Cloud Management Award

NetworkWorld 10 Startups to Watch

EMA Most Notable Vendor

 

On Wed, Mar 11, 2015 at 6:41 AM, Jason Dai <jason.dai@gmail.com> wrote:

Yes, a previous prototype is available https://github.com/Intel-bigdata/spark-streamsql, and a talk is given at last year's Spark Summit (http://spark-summit.org/2014/talk/streamsql-on-spark-manipulating-streams-by-sql-using-spark)

 

We are currently porting the prototype to use the latest DataFrame API, and will provide a stable version for people to try soon.

 

Thabnks,

-Jason 

 

 

On Wed, Mar 11, 2015 at 9:12 AM, Tobias Pfeiffer <tgp@preferred.jp> wrote:

Hi,

 

On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao <hao.cheng@intel.com> wrote:

Intel has a prototype for doing this, SaiSai and Jason are the authors. Probably you can ask them for some materials.

 

The github repository is here: https://github.com/intel-spark/stream-sql

 

Also, what I did is writing a wrapper class SchemaDStream that internally holds a DStream[Row] and a DStream[StructType] (the latter having just one element in every RDD) and then allows to do

- operations SchemaRDD => SchemaRDD using `rowStream.transformWith(schemaStream, ...)`

- in particular you can register this stream's data as a table this way

- and via a companion object with a method `fromSQL(sql: String): SchemaDStream` you can get a new stream from previously registered tables.

 

However, you are limited to batch-internal operations, i.e., you can't aggregate across batches.

 

I am not able to share the code at the moment, but will within the next months. It is not very advanced code, though, and should be easy to replicate. Also, I have no idea about the performance of transformWith....

 

Tobias