spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: correct use of DStream foreachRDD
Date Fri, 28 Aug 2015 14:59:35 GMT
Yes, for example "val sensorRDD =" is a
line of code executed on the driver; it's part the function you
supplied to foreachRDD. However that line defines an operation on an
RDD, and the map function you supplied (parseSensor) will ultimately
be carried out on the cluster.

If you mean, is the bulk of the work (the Sensor.*) methods happening
on the cluster? yes.

Ewan's version looks cleaner, though it will ultimately be equivalent
and doesn't cause operations to happen in a different place.

(PS I don't think you need "new PairRDDFunctions"; the implicits it
defines should be automatically available.
"" should be sufficient. In slightly
older versions of Spark you have to import SparkContext._ to get these

On Fri, Aug 28, 2015 at 3:29 PM, Carol McDonald <> wrote:
> I would like to make sure  that I am using the DStream  foreachRDD operation
> correctly. I would like to read from a DStream transform the input and write
> to HBase.  The code below works , but I became confused when I read "Note
> that the function func is executed in the driver process" ?
>     val lines = ssc.textFileStream("/stream")
>     lines.foreachRDD { rdd =>
>       // parse the line of data into sensor object
>       val sensorRDD =
>       // convert sensor data to put object and write to HBase table column
> family data
>       new PairRDDFunctions(sensorRDD.
>                   map(Sensor.convertToPut)).
>                   saveAsHadoopDataset(jobConfig)
>     }

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message