kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Hauser <ewhau...@gmail.com>
Subject Re: Aggregating tomcat, log4j, other logs in realtime
Date Thu, 29 Sep 2011 19:38:58 GMT
Jun,

I was referring to the logic that would be necessary for the consumer
of the topic to rotate the log files on the centralized log server.
With Flume you would handle this via configuration:

collectorSink("file://var/logs/flume/webdata/%Y-%m-%d/%H00/", "web-")

You would probably just use log4j or what not in your Kafka consumer
to handle this.

On Thu, Sep 29, 2011 at 3:20 PM, Jun Rao <junrao@gmail.com> wrote:
> Eric,
>
> Thanks for the analysis. A couple of comments:
>
> Kafka recently added the end-to-end compression feature and we will be
> releasing it soon. Please see
> https://issues.apache.org/jira/browse/KAFKA-79for details.
>
> About the file rolling support, are you referring to Kafka log? Kafka logs
> are rolled based on a preconfigured size.
>
> Thanks,
>
> Jun
>
> On Thu, Sep 29, 2011 at 11:25 AM, Eric Hauser <ewhauser@gmail.com> wrote:
>
>> Jeremy,
>>
>> I've used both Flume and Kafka, and I can provide some info for comparison:
>>
>> Flume
>> - The current Flume release 0.9.4 has some pretty nasty bugs in it
>> (most have been fixed in trunk).
>> - Flume is a more complex to maintain operations-wise (IMO) than Kafka
>> since you have to setup masters and collectors (you don't necessarily
>> need collectors if you aren't writing to HDFS)
>> - Flume has a well defined pattern for doing what you want:
>>
>> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
>>
>> Kafka
>> - If you need multiple Kafka partitions for the logs, you will want to
>> partition by host so the messages arrive in order for the same host
>> - You can use the same piped technique as Flume to publish to Kafka,
>> but you'll have to write a little code to publish and subscribe to the
>> stream
>> - Kafka does not provide any of the file rolling, compression, etc.
>> that Flume provides
>> - If you ever want to do anything more interesting with those log
>> files than just send them to one location, publishing them to Kafka
>> would allow you to add additional consumers later.  Flume has a
>> concept of fanout sinks, but I don't care for the way it works.
>>
>>
>>
>> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <junrao@gmail.com> wrote:
>> > Jeremy,
>> >
>> > Yes, Kafka will be a good fit for that.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna
>> > <jeremy.hanna1234@gmail.com>wrote:
>> >
>> >> We have a number of web servers in ec2 and periodically we just blow
>> them
>> >> away and create new ones.  That makes keeping logs problematic.  We're
>> >> looking for a way to stream the logs from those various sources directly
>> to
>> >> a central log server - either just a single server or hdfs or something
>> like
>> >> that.
>> >>
>> >> My question is whether kafka is a good fit for that or should I be
>> looking
>> >> more along the lines of flume or scribe?
>> >>
>> >> Many thanks.
>> >>
>> >> Jeremy
>> >
>>
>

Mime
View raw message