Also check this out
https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala

From the data bricks reference app: https://github.com/databricks/reference-apps

From: Ewan Leith
Date: Tuesday, September 29, 2015 at 5:09 PM
To: Daniel Haviv, user
Subject: RE: Converting a DStream to schemaRDD

Something like:

 

dstream.foreachRDD { rdd =>

  val df =  sqlContext.read.json(rdd)

  df.select(…)

}

 

https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations-on-dstreams

 

 

Might be the place to start, it’ll convert each batch of dstream into an RDD then let you work it as if it were a standard RDD dataset.

 

Ewan

               

 

From: Daniel Haviv [mailto:daniel.haviv@veracity-group.com]
Sent: 29 September 2015 15:03
To: user <user@spark.apache.org>
Subject: Converting a DStream to schemaRDD

 

Hi,

I have a DStream which is a stream of RDD[String].

 

How can I pass a DStream to sqlContext.jsonRDD and work with it as a DF ?

 

Thank you.

Daniel