kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anurag <anurag.pha...@gmail.com>
Subject Re: Aggregating tomcat, log4j, other logs in realtime
Date Thu, 29 Sep 2011 18:38:03 GMT
Eric/Jun,
Can you throw some light on how to handle apache log rotation? afaik,
even if we write custom code to tail a file, the file handle is lost
on rotation and might result in some loss of data.


On Thu, Sep 29, 2011 at 11:35 AM, Jeremy Hanna
<jeremy.hanna1234@gmail.com> wrote:
> Thanks a lot for the comparison Eric.  Really good to hear a perspective from a user
of both.
>
> On Sep 29, 2011, at 1:25 PM, Eric Hauser wrote:
>
>> Jeremy,
>>
>> I've used both Flume and Kafka, and I can provide some info for comparison:
>>
>> Flume
>> - The current Flume release 0.9.4 has some pretty nasty bugs in it
>> (most have been fixed in trunk).
>> - Flume is a more complex to maintain operations-wise (IMO) than Kafka
>> since you have to setup masters and collectors (you don't necessarily
>> need collectors if you aren't writing to HDFS)
>> - Flume has a well defined pattern for doing what you want:
>> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
>>
>> Kafka
>> - If you need multiple Kafka partitions for the logs, you will want to
>> partition by host so the messages arrive in order for the same host
>> - You can use the same piped technique as Flume to publish to Kafka,
>> but you'll have to write a little code to publish and subscribe to the
>> stream
>> - Kafka does not provide any of the file rolling, compression, etc.
>> that Flume provides
>> - If you ever want to do anything more interesting with those log
>> files than just send them to one location, publishing them to Kafka
>> would allow you to add additional consumers later.  Flume has a
>> concept of fanout sinks, but I don't care for the way it works.
>>
>>
>>
>> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <junrao@gmail.com> wrote:
>>> Jeremy,
>>>
>>> Yes, Kafka will be a good fit for that.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna
>>> <jeremy.hanna1234@gmail.com>wrote:
>>>
>>>> We have a number of web servers in ec2 and periodically we just blow them
>>>> away and create new ones.  That makes keeping logs problematic.  We're
>>>> looking for a way to stream the logs from those various sources directly
to
>>>> a central log server - either just a single server or hdfs or something like
>>>> that.
>>>>
>>>> My question is whether kafka is a good fit for that or should I be looking
>>>> more along the lines of flume or scribe?
>>>>
>>>> Many thanks.
>>>>
>>>> Jeremy
>>>
>
>

Mime
View raw message