spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tanase <atan...@adobe.com>
Subject Re: Converting a DStream to schemaRDD
Date Tue, 29 Sep 2015 14:13:11 GMT
Also check this out
https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala

From the data bricks reference app: https://github.com/databricks/reference-apps

From: Ewan Leith
Date: Tuesday, September 29, 2015 at 5:09 PM
To: Daniel Haviv, user
Subject: RE: Converting a DStream to schemaRDD

Something like:

dstream.foreachRDD { rdd =>
  val df =  sqlContext.read.json(rdd)
  df.select(…)
}

https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations-on-dstreams


Might be the place to start, it’ll convert each batch of dstream into an RDD then let you
work it as if it were a standard RDD dataset.

Ewan


From: Daniel Haviv [mailto:daniel.haviv@veracity-group.com]
Sent: 29 September 2015 15:03
To: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Converting a DStream to schemaRDD

Hi,
I have a DStream which is a stream of RDD[String].

How can I pass a DStream to sqlContext.jsonRDD and work with it as a DF ?

Thank you.
Daniel

Mime
View raw message