kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Chan ...@ooyala.com>
Subject Re: Aggregating tomcat, log4j, other logs in realtime
Date Thu, 29 Sep 2011 18:43:44 GMT
One more point to this thread.  It's really hard to do partitioning in
Flume.
If you need partitioning but don't want to deal with a set of central
brokers, and don't need persistence, you can check out the new Storm project
(github.com/nathanmarz)

-Evan


On Thu, Sep 29, 2011 at 11:38 AM, Anurag <anurag.phadke@gmail.com> wrote:

> Eric/Jun,
> Can you throw some light on how to handle apache log rotation? afaik,
> even if we write custom code to tail a file, the file handle is lost
> on rotation and might result in some loss of data.
>
>
> On Thu, Sep 29, 2011 at 11:35 AM, Jeremy Hanna
> <jeremy.hanna1234@gmail.com> wrote:
> > Thanks a lot for the comparison Eric.  Really good to hear a perspective
> from a user of both.
> >
> > On Sep 29, 2011, at 1:25 PM, Eric Hauser wrote:
> >
> >> Jeremy,
> >>
> >> I've used both Flume and Kafka, and I can provide some info for
> comparison:
> >>
> >> Flume
> >> - The current Flume release 0.9.4 has some pretty nasty bugs in it
> >> (most have been fixed in trunk).
> >> - Flume is a more complex to maintain operations-wise (IMO) than Kafka
> >> since you have to setup masters and collectors (you don't necessarily
> >> need collectors if you aren't writing to HDFS)
> >> - Flume has a well defined pattern for doing what you want:
> >>
> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
> >>
> >> Kafka
> >> - If you need multiple Kafka partitions for the logs, you will want to
> >> partition by host so the messages arrive in order for the same host
> >> - You can use the same piped technique as Flume to publish to Kafka,
> >> but you'll have to write a little code to publish and subscribe to the
> >> stream
> >> - Kafka does not provide any of the file rolling, compression, etc.
> >> that Flume provides
> >> - If you ever want to do anything more interesting with those log
> >> files than just send them to one location, publishing them to Kafka
> >> would allow you to add additional consumers later.  Flume has a
> >> concept of fanout sinks, but I don't care for the way it works.
> >>
> >>
> >>
> >> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <junrao@gmail.com> wrote:
> >>> Jeremy,
> >>>
> >>> Yes, Kafka will be a good fit for that.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna
> >>> <jeremy.hanna1234@gmail.com>wrote:
> >>>
> >>>> We have a number of web servers in ec2 and periodically we just blow
> them
> >>>> away and create new ones.  That makes keeping logs problematic.  We're
> >>>> looking for a way to stream the logs from those various sources
> directly to
> >>>> a central log server - either just a single server or hdfs or
> something like
> >>>> that.
> >>>>
> >>>> My question is whether kafka is a good fit for that or should I be
> looking
> >>>> more along the lines of flume or scribe?
> >>>>
> >>>> Many thanks.
> >>>>
> >>>> Jeremy
> >>>
> >
> >
>



-- 
--
*Evan Chan*
Senior Software Engineer |
ev@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message