kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurrum Nasim <khurrumnas...@gmail.com>
Subject Re: Comparing Pulsar and Kafka: unified queuing and streaming
Date Tue, 05 Dec 2017 17:57:17 GMT
Hi Marina,


On Tue, Dec 5, 2017 at 6:58 AM, Marina Popova <ppine7sub@protonmail.com>
wrote:

> Hi,
> I don't think it would be such a great idea to start modifying the very
> foundation of Kafka's design to accommodate more and more extra use cases.
> Kafka because so widely adopted and popular because its creator made a
> brilliant decision to make it "dumb broker - smart consumer" type of the
> system, where there is no to minimal dependencies between Kafka brokers and
> Consumers. This is what make Kafka blazingly fast and truly scalable - able
> to handle thousands of Consumers with no impact on performance.
>

I am not sure I agree with this. I think from end-user perspective, what
users expect is a ultra simple streaming/messaging system: applications
sends messages, messaging systems store and dispatch them, consumers
consume the messages and tell the systems that they already consumed the
messages. IMO whether a centralized management or decentralize management
doesn't really matter here if kafka is able to do things without impacting
performance.

sometimes people assume that smarter brokers (like traditional messaging
brokers) can not offer high throughput and scalability, because they do
"too many things". but I took a look at Pulsar documentation and their
presentation. There are a few metrics very impressive:

https://image.slidesharecdn.com/apachepulsar-171113225233/95/bdam-multitenant-and-georeplication-messaging-with-apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-638.jpg?cb=1510613990

<https://image.slidesharecdn.com/apachepulsar-171113225233/95/bdam-multitenant-and-georeplication-messaging-with-apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-638.jpg?cb=1510613990>-
1.8 million messages/second per topic partition
- 99pct producing latency less than 5ms with stronger durability
- support millions of topics
- it also supports at-least-once and effectively-once producing

Those metrics sound appealing to me if pulsar supports both streaming and
queuing. I am wondering if anyone in the community tries to do a
performance testing or benchmark between Pulsar and Kafka. I would love to
see such results that can help people understand both systems, pros and
cons.


- KN



>
> One unfortunate consequence of becoming so popular - is that more and more
> people are trying to fit Kafka into their architectures not because it
> really fits, but because everybody else is doing so :) And this causes many
> requests to support more and more reacher functionality to be added to
> Kafka - like transactional messages, more complex acks, centralized
> consumer management, etc.
>
> If you really need those feature - there are other systems that are
> designed for that.
>
> I truly worry that if all those changes are added to Core Kafka - it will
> become just another "do it all" enterprise-level monster that will be able
> to do it all but at a price of mediocre performance and ten-fold increased
> complexity (and, thus, management and possibility of bugs). Sure, there has
> to be innovation and new features added - but maybe those that require
> major changes to the Kafka's core principles should go into separate
> frameworks, plug-ing (like Connectors) or something in that line, rather
> that packing it all into the Core Kafka.
>
> Just my 2 cents :)
>
> Marina
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
> > -------- Original Message --------
> > Subject: Re: Comparing Pulsar and Kafka: unified queuing and streaming
> > Local Time: December 4, 2017 2:56 PM
> > UTC Time: December 4, 2017 7:56 PM
> > From: jason@confluent.io
> > To: dev@kafka.apache.org
> > Kafka Users <users@kafka.apache.org>
> >
> > Hi Khurrum,
> >
> > Thanks for sharing the article. I think one interesting aspect of Pulsar
> > that stands out to me is its notion of a subscription and how it impacts
> > message retention. In Kafka, consumers are more loosely coupled and
> > retention is enforced independently of consumption. There are some
> > scenarios I can imagine where the tighter coupling might be beneficial.
> For
> > example, in Kafka Streams, we often use intermediate topics to store the
> > data in one stage of the topology's computation. These topics are
> > exclusively owned by the application and once the messages have been
> > successfully received by the next stage, we do not need to retain them
> > further. But since consumption is independent of retention, we either
> have
> > to choose a large retention time and deal with some temporary storage
> waste
> > or we use a low retention time and possibly lose some messages during an
> > outage.
> >
> > We have solved this problem to some extent in Kafka by introducing an API
> > to delete the records in a partition up to a certain offset, but this
> > effectively puts the burden of this use case on clients. It would be
> > interesting to consider whether we could do something like Pulsar in the
> > Kafka broker. For example, we have a consumer group coordinator which is
> > able to track the progress of the group through its committed offsets. It
> > might be possible to extend it to automatically delete records in a topic
> > after offsets are committed if the topic is known to be exclusively owned
> > by the consumer group. We already have the DeleteRecords API that need,
> so
> > maybe this is "just" a matter of some additional topic metadata. I'd be
> > interested to hear whether this kind of use case is common among our
> users.
> >
> > -Jason
> >
> > On Sun, Dec 3, 2017 at 10:29 PM, Khurrum Nasim khurrumnasimm@gmail.com
> > wrote:
> >
> >> Dear Kafka Community,
> >> I happened to read this blog post comparing the messaging model between
> >> Apache Pulsar and Apache Kafka. It sounds interesting. Apache Pulsar
> claims
> >> to unify streaming (kafka) and queuing (rabbitmq) in one unified API.
> >> Pulsar also seems to support Kafka API. Have anyone taken a look at
> Pulsar?
> >> How does the community think about this? Pulsar is also an Apache
> project.
> >> Is there any collaboration can happen between these two projects?
> >> https://streaml.io/blog/pulsar-streaming-queuing/
> >> BTW, I am a Kafka user, loving Kafka a lot. Just try to see what other
> >> people think about this.
> >>
> >> - KN
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message