I'll start with Kafka implementation.
Thanks for all the help.
It is my understanding that there is no way to make FlumeInputDStream work in a cluster environment with the current release. Switch to Kafka, if you can, would be my suggestion, although I have not used KafkaInputDStream. There is a big difference between Kafka and Flume InputDstream: KafkaInputDStreams are consumers (clients). FlumeInputDStream, which needs to listen on a specific address:port so other flume agent can send messages to. This may also give Kafka an advantage on performance too.
If you reply to this email, your message will be added to the discussion below:http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p2994.html