spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tanase <atan...@adobe.com>
Subject Re: SQLContext within foreachRDD
Date Mon, 12 Oct 2015 10:07:16 GMT
Not really, unless you’re doing something wrong (e.g. Call collect or similar).

In the foreach loop you’re typically registering a temp table, by converting an RDD to data
frame. All the subsequent queries are executed in parallel on the workers.

I haven’t built production apps with this pattern but I have successfully built a prototype
where I execute dynamic SQL on top of a 15 minute window (obtained with .window on the Dstream)
- and it works as expected.

Check this out for code example: https://github.com/databricks/reference-apps/blob/master/logs_analyzer/chapter1/scala/src/main/scala/com/databricks/apps/logs/chapter1/LogAnalyzerStreamingSQL.scala

-adrian

From: Daniel Haviv
Date: Monday, October 12, 2015 at 12:52 PM
To: user
Subject: SQLContext within foreachRDD

Hi,
As things that run inside foreachRDD run at the driver, does that mean that if we use SQLContext
inside foreachRDD the data is sent back to the driver and only then the query is executed
or is it executed at the executors?


Thank you.
Daniel


Mime
View raw message