spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Combining reading from Kafka and HDFS w/ Spark Streaming
Date Thu, 02 Mar 2017 01:19:44 GMT
(Sorry if this is a duplicate. I got a strange error message when I first
tried to send it earlier)

I want to pull HDFS paths from Kafka and build text streams based on those
paths. I currently have:

val lines = KafkaUtils.createStream(/* params here */).map(_._2)
val buffer  = new ArrayBuffer[String]()

lines.foreachRDD(rdd => {
  if (!rdd.partitions.isEmpty) {
    rdd.collect().foreach(line => { buffer += line })
  }
})

buffer.foreach(path => {
  streamingContext.textFileStream(path).foreachRDD(rdd => {
    println(s"${path} => ${rdd.count()}")
  })
})

streamingContext.start
streamingContext.awaitTermination

It's not actually counting any of the files in the paths, and I know the
paths are valid.

Can someone tell me if this is possible and if so, give me a pointer on how
to fix this?

Thanks,

Mike

Mime
View raw message