spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Huang, Jie" <>
Subject RE: SQL with Spark Streaming
Date Wed, 11 Mar 2015 15:08:31 GMT

According to my understanding, your approach is to register a series of tables by using transformWith,
right? And then, you can get a new Dstream (i.e., SchemaDstream), which consists of lots of

Please correct me if my understanding is wrong.

Thank you && Best Regards,
Grace (Huang Jie)

From: Jason Dai []
Sent: Wednesday, March 11, 2015 10:45 PM
To: Irfan Ahmad
Cc: Tobias Pfeiffer; Cheng, Hao; Mohit Anchlia;; Shao, Saisai; Dai,
Jason; Huang, Jie
Subject: Re: SQL with Spark Streaming

Sorry typo; should be


On Wed, Mar 11, 2015 at 10:19 PM, Irfan Ahmad <<>>
Got a 404 on that link:

Irfan Ahmad
CTO | Co-Founder | CloudPhysics<>
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Wed, Mar 11, 2015 at 6:41 AM, Jason Dai <<>>
Yes, a previous prototype is available, and
a talk is given at last year's Spark Summit (

We are currently porting the prototype to use the latest DataFrame API, and will provide a
stable version for people to try soon.


On Wed, Mar 11, 2015 at 9:12 AM, Tobias Pfeiffer <<>>

On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao <<>>
Intel has a prototype for doing this, SaiSai and Jason are the authors. Probably you can ask
them for some materials.

The github repository is here:

Also, what I did is writing a wrapper class SchemaDStream that internally holds a DStream[Row]
and a DStream[StructType] (the latter having just one element in every RDD) and then allows
to do
- operations SchemaRDD => SchemaRDD using `rowStream.transformWith(schemaStream, ...)`
- in particular you can register this stream's data as a table this way
- and via a companion object with a method `fromSQL(sql: String): SchemaDStream` you can get
a new stream from previously registered tables.

However, you are limited to batch-internal operations, i.e., you can't aggregate across batches.

I am not able to share the code at the moment, but will within the next months. It is not
very advanced code, though, and should be easy to replicate. Also, I have no idea about the
performance of transformWith....


View raw message