spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Spark and Flume integration - do I understand this correctly?
Date Tue, 29 Jul 2014 20:52:28 GMT
Hari, can you help?

TD

On Tue, Jul 29, 2014 at 12:13 PM, dapooley <dapooley@gmail.com> wrote:
> Hi,
>
> I am trying to integrate Spark onto a Flume log sink and avro source. The
> sink is on one machine (the application), and the source is on another. Log
> events are being sent from the application server to the avro source server
> (a log directory sink on the arvo source prints to verify)
>
> The aim is to get Spark to also receive the same events that the avro source
> is getting. The steps, I believe, are:
>
> 1. install/start Spark master (on avro source machine).
> 2. write spark application, deploy (on avro source machine).
> 3. add spark application as a worker to the master.
> 4. have spark application configured to same port as avro source
>
> Test setup is using 2 ubuntu VMs on a Windows host.
>
> Flume configuration:
>
> ######################### application ##############################
> ## Tail application log file
> # /var/lib/apache-flume-1.5.0-bin/bin/flume-ng agent -n cps -c conf -f
> conf/flume-conf.properties
> # http://flume.apache.org/FlumeUserGuide.html#exec-source
> source_agent.sources = tomcat
> source_agent.sources.tomcat.type = exec
> source_agent.sources.tomcat.command = tail -F
> /var/lib/tomcat/logs/application.log
> source_agent.sources.tomcat.batchSize = 1
> source_agent.sources.tomcat.channels = memoryChannel
>
> # http://flume.apache.org/FlumeUserGuide.html#memory-channel
> source_agent.channels = memoryChannel
> source_agent.channels.memoryChannel.type = memory
> source_agent.channels.memoryChannel.capacity = 100
>
> ## Send to Flume Collector on Analytics Node
> # http://flume.apache.org/FlumeUserGuide.html#avro-sink
> source_agent.sinks = avro_sink
> source_agent.sinks.avro_sink.type = avro
> source_agent.sinks.avro_sink.channel = memoryChannel
> source_agent.sinks.avro_sink.hostname = 10.0.2.2
> source_agent.sinks.avro_sink.port = 41414
>
>
> ######################## avro source ##############################
> ## Receive Flume events for Spark streaming
>
> # http://flume.apache.org/FlumeUserGuide.html#memory-channel
> agent1.channels = memoryChannel
> agent1.channels.memoryChannel.type = memory
> agent1.channels.memoryChannel.capacity = 100
>
> ## Flume Collector on Analytics Node
> # http://flume.apache.org/FlumeUserGuide.html#avro-source
> agent1.sources = avroSource
> agent1.sources.avroSource.type = avro
> agent1.sources.avroSource.channels = memoryChannel
> agent1.sources.avroSource.bind = 0.0.0.0
> agent1.sources.avroSource.port = 41414
>
> #Sinks
> agent1.sinks = localout
>
> #http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
> agent1.sinks.localout.type = file_roll
> agent1.sinks.localout.sink.directory = /home/vagrant/flume/logs
> agent1.sinks.localout.sink.rollInterval = 0
> agent1.sinks.localout.channel = memoryChannel
>
> thank you in advance for any assistance,
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-Flume-integration-do-I-understand-this-correctly-tp10879.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message