spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jesper Lundgren <koudel...@gmail.com>
Subject Spark Streaming: foreachRDD network output
Date Fri, 26 Sep 2014 05:35:04 GMT
Hello all,

I have some questions regarding the foreachRDD output function in Spark
Streaming.

The programming guide (
http://spark.apache.org/docs/1.1.0/streaming-programming-guide.html)
describes how to output data using network connection on the worker nodes.

Are there some working examples on how to do this properly? (Most of the
guide just describes what to not do, instead of what to do).

Any suggestions on what is the best way to write tests for such code? To
make sure that connection objects are used properly etc.

How to handle network or other problems on worker node? Can I throw an
exception to force spark to try again with that data on another node? As an
example: a program that writes data to an sql database using foreachRDD.
One worker node might have connection issues to the database, so it has to
let another node finish the output operation.

Thanks!

-- Jesper Lundgren

Mime
View raw message