From Marc Labbe <mrla...@gmail.com>
Subject Re: Arguments for Kafka over RabbitMQ ?
Date Fri, 07 Jun 2013 13:30:36 GMT
Nice of you to reply Alexis and clarify things! FWIW, I personally like
Rabbit very much and I am pushing to use it for other purposes at my
company. It's flexibility, ease of use and even documentation is really top
if you compare with other options.

I might have explained some of my points a bit too rapidly and your
clarifications are much better than what I could do. I think the message
duplication is where I possibly made a false assumption but it wasn't a
decisive factor in our case. It might have been a setup issue also, not
sure. In any way, I don't think here is the place to discuss that.

On the other hand, as a user, I hope Kafka will not try to become a general
purpose messaging system because that's the reason I opted to use it.

On Fri, Jun 7, 2013 at 8:54 AM, Alexis Richardson <alexis@rabbitmq.com>wrote:

> Hi
> Alexis from Rabbit here.  I hope I am not intruding!
> It would be super helpful if people with questions, observations or
> moans posted them to the rabbitmq list too :-)
> A few comments:
> * Along with ZeroMQ, I consider Kafka to be one of the interesting and
> useful messaging projects out there.  In a world of cruft, Kafka is
> cool!
> * This is because both projects come at messaging from a specific
> point of view that is *different* from Rabbit.  OTOH, many other
> projects exist that replicate Rabbit features for fun, or NIH, or due
> to misunderstanding the semantics (yes, our docs could be better)
> * It is striking how few people describe those differences.  In a
> nutshell they are as follows:
> *** Kafka writes all incoming data to disk immediately, and then
> figures out who sees what.  So it is much more like a database than
> Rabbit, in that new consumers can appear well after the disk write and
> still subscribe to past messages.  Instead, Rabbit which tries to
> deliver to consumers and buffers otherwise.  Persistence is optional
> but robust and a feature of the buffer ("queue") not the upstream
> machinery.  Rabbit is able to cache-on-arrival via a plugin, but this
> is a design overlay and not particularly optimal.
> *** Kafka is a client server system with end to end semantics.  It
> defines order to include processing order, and keeps state on the
> client to do this.  Group management is via a 3rd party service
> (Zookeeper? I forget which).  Rabbit is a server-only protocol based
> system which maintains order on the server and through completely
> language neutral protocol semantics.  This makes Rabbit perhaps more
> natural as a 'messaging service' eg for integration and other
> inter-app data transfer.
> *** Rabbit is a general purpose messaging system with extras like
> federation.  It speaks many protocols, and has core features like HA,
> transactions, management, etc.  Everything can be switched on or off.
> Getting all this to work while keeping the install light and fast, is
> quite fiddly.  Kafka by contrast comes from a specific set of use
> cases, which are interesting certainly.  I am not sure if Kafka wants
> to be a general purpose messaging system, but it will become a bit
> more like Rabbit if that is the goal.
> *** Both approaches have costs.  In the case of Rabbit the cost is
> that more metadata is stored on the broker.  Kafka can get performance
> gains by storing less such data.  But we are talking about some N
> thousands of MPS versus some M thousands.  At those speeds the clients
> are usually the bottleneck anyway.
> * Let me also clarify some things:
> *** Rabbit does NOT store multiple copies of the same message across
> queues, unless they are very small (<60b, iirc).  A message delivered
> to >1 queue on 1 machine is stored once.  Metadata about that message
> may be stored more than once, but, at scale, the big cost is the
> payload.
> *** Rabbit's vanilla install does store some index data in memory when
> messages flow to disk.  You can change this by using a plugin, but
> this is a secret-menu undocumented feature.  Very very few people need
> any such thing.
> *** A Rabbit queue is lightweight.  It's just an ordered consumption
> buffer that can persist and ack.  Don't assume things about Rabbit
> queues based on what you know about IBM MQ, JMS, and so forth.  Queues
> in Rabbit and Kafka are not the same.
> *** Rabbit does not use mnesia for message storage.  It has its own
> DB, optimised for messaging.  You can use other DBs but this is
> Complicated.
> *** Rabbit does all kinds of batching and bulk processing, and can
> batch end to end.  If you see claims about batching, buffering, etc.,
> find out ALL the details before drawing conclusions.
> I hope this is helpful.
> Keen to get feedback / questions / corrections.
> alexis
> On Fri, Jun 7, 2013 at 2:09 AM, Marc Labbe <mrlabbe@gmail.com> wrote:
> > We also went through the same decision making and our arguments for Kafka
> > where in the same lines as those Jonathan mentioned. The fact that we
> have
> > heterogeneous consumers is really a deciding factor. Our requirements
> were
> > to avoid loosing messages at all cost while having multiple consumers
> > reading the same data at a different pace. On one side, we have a few
> > consumers being fed with data coming in from most, if not all, topics. On
> > the other side, we have a good bunch of consumers reading only from a
> > single topic. The big guys can take their time to read while the smaller
> > ones are mostly for near real-time events so they need to keep up the
> pace
> > of incoming messages.
> >
> > RabbitMQ stores data on disk only if you tell it to while Kafka persists
> by
> > design. From the beginning, we decided we would try to use the queues the
> > same way, pub/sub with a routing key (an exchange in RabbitMQ) or topic,
> > persisted to disk and replicated.
> >
> > One of our scenario was to see how the system would cope with the largest
> > consumer down for a while, therefore forcing the brokers to keep the data
> > for a long period. In the case of RabbitMQ, this consumer has it owns
> queue
> > and data grows on disk, which is not really a problem if you plan
> > consequently. But, since it has to keep track of all messages read, the
> > Mnesia database used by RabbitMQ as the messages index also grows pretty
> > big. At that point, the amount of RAM necessary becomes very large to
> keep
> > the level of performance we need. In our tests, we found that this an
> > adverse effect on ALL the brokers, thus affecting all consumers. You can
> > always say that you'll monitor the consumers to make sure it won't
> happen.
> > That's a good thing if you can. I wasn't ready to make that bet.
> >
> > Another point is the fact that, since we wanted to use pub/sub with a
> > exchange in RabbitMQ, we would have ended up with a lot data duplication
> > because if a message is read by multiple consumers, it will get
> duplicated
> > in the queue of each of those consumer. Kafka wins on that side too since
> > every consumer reads from the same source.
> >
> > The downsides of Kafka were the language issues (we are using mostly
> Python
> > and C#). 0.8 is very new and few drivers are available at this point.
> Also,
> > we will have to try getting as close as possible to once-and-only-once
> > guarantee. There are two things where RabbitMQ would have given us less
> > work out of the box as opposed to Kafka. RabbitMQ also provides a bunch
> of
> > tools that makes it rather attractive too.
> >
> > In the end, looking at throughput is a pretty nifty thing but being sure
> > that I'll be able to manage the beast as it grows will allow me to get to
> > sleep way more easily.
> >
> >
> > On Thu, Jun 6, 2013 at 3:28 PM, Jonathan Hodges <hodgesz@gmail.com>
> wrote:
> >
> >> We just went through a similar exercise with RabbitMQ at our company
> with
> >> streaming activity data from our various web properties.  Our use case
> >> requires consumption of this stream by many heterogeneous consumers
> >> including batch (Hadoop) and real-time (Storm).  We pointed out that
> Kafka
> >> acts as a configurable rolling window of time on the activity stream.
>  The
> >> window default is 7 days which allows for supporting clients of
> different
> >> latencies like Hadoop and Storm to read from the same stream.
> >>
> >> We pointed out that the Kafka brokers don't need to maintain consumer
> state
> >> in the stream and only have to maintain one copy of the stream to
> support N
> >> number of consumers.  Rabbit brokers on the other hand have to maintain
> the
> >> state of each consumer as well as create a copy of the stream for each
> >> consumer.  In our scenario we have 10-20 consumers and with the scale
> and
> >> throughput of the activity stream we were able to show Rabbit quickly
> >> becomes the bottleneck under load.
> >>
> >>
> >>
> >> On Thu, Jun 6, 2013 at 12:40 PM, Dragos Manolescu <
> >> Dragos.Manolescu@servicenow.com> wrote:
> >>
> >> > Hi --
> >> >
> >> > I am preparing to make a case for using Kafka instead of Rabbit MQ as
> a
> >> > broker-based messaging provider. The context is similar to that of the
> >> > Kafka papers and user stories: the producers publish monitoring data
> and
> >> > logs, and a suite of subscribers consume this data (some store it,
> others
> >> > perform computations on the event stream). The requirements are
> typical
> >> of
> >> > this context: low-latency, high-throughput, ability to deal with
> bursts
> >> and
> >> > operate in/across multiple data centers, etc.
> >> >
> >> > I am familiar with the performance comparison between Kafka, Rabbit MQ
> >> and
> >> > Active MQ from the NetDB 2011 paper<
> >> >
> >>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> >> >.
> >> > However in the two years that passed since then the number of
> production
> >> > Kafka installations increased, and people are using it in different
> ways
> >> > than those imagined by Kafka's designers. In light of these
> experiences
> >> one
> >> > can use more data points and color when contrasting to Rabbit MQ
> (which
> >> by
> >> > the way also evolved since 2011). (And FWIW I know I am not the first
> one
> >> > to walk this path; see for example last year's OSCON session on the
> State
> >> > of MQ<http://lanyrd.com/2012/oscon/swrcz/>.)
> >> >
> >> > I would appreciate it if you could share measurements, results, or
> even
> >> > anecdotal evidence along these lines. How have you avoided the "let's
> use
> >> > Rabbit MQ because everybody else does it" route when solving problems
> for
> >> > which Kafka is a better fit?
> >> >
> >> > Thanks,
> >> >
> >> > -Dragos
> >> >
> >>

