spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewan Leith <>
Subject RE: correct use of DStream foreachRDD
Date Fri, 28 Aug 2015 14:43:51 GMT
I think what you’ll want is to carry out the .map functions before the foreachRDD, something

    val lines = ssc.textFileStream("/stream").map(Sensor.parseSensor).map(Sensor.convertToPut)

    lines.foreachRDD { rdd =>
      // parse the line of data into sensor object


Will perform the bulk of the work in the distributed processing, before the results are returned
to the driver for writing to HBase.


From: Carol McDonald []
Sent: 28 August 2015 15:30
To: user <>
Subject: correct use of DStream foreachRDD

I would like to make sure  that I am using the DStream  foreachRDD operation correctly. I
would like to read from a DStream transform the input and write to HBase.  The code below
works , but I became confused when I read "Note that the function func is executed in the
driver process" ?

    val lines = ssc.textFileStream("/stream")

    lines.foreachRDD { rdd =>
      // parse the line of data into sensor object
      val sensorRDD =

      // convert sensor data to put object and write to HBase table column family data
      new PairRDDFunctions(sensorRDD.

View raw message