kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Hauser <ewhau...@gmail.com>
Subject Re: Aggregating tomcat, log4j, other logs in realtime
Date Thu, 29 Sep 2011 18:25:56 GMT

I've used both Flume and Kafka, and I can provide some info for comparison:

- The current Flume release 0.9.4 has some pretty nasty bugs in it
(most have been fixed in trunk).
- Flume is a more complex to maintain operations-wise (IMO) than Kafka
since you have to setup masters and collectors (you don't necessarily
need collectors if you aren't writing to HDFS)
- Flume has a well defined pattern for doing what you want:

- If you need multiple Kafka partitions for the logs, you will want to
partition by host so the messages arrive in order for the same host
- You can use the same piped technique as Flume to publish to Kafka,
but you'll have to write a little code to publish and subscribe to the
- Kafka does not provide any of the file rolling, compression, etc.
that Flume provides
- If you ever want to do anything more interesting with those log
files than just send them to one location, publishing them to Kafka
would allow you to add additional consumers later.  Flume has a
concept of fanout sinks, but I don't care for the way it works.

On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <junrao@gmail.com> wrote:
> Jeremy,
> Yes, Kafka will be a good fit for that.
> Thanks,
> Jun
> On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna
> <jeremy.hanna1234@gmail.com>wrote:
>> We have a number of web servers in ec2 and periodically we just blow them
>> away and create new ones.  That makes keeping logs problematic.  We're
>> looking for a way to stream the logs from those various sources directly to
>> a central log server - either just a single server or hdfs or something like
>> that.
>> My question is whether kafka is a good fit for that or should I be looking
>> more along the lines of flume or scribe?
>> Many thanks.
>> Jeremy

View raw message