kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Gustafson <ja...@confluent.io>
Subject Re: Comparing Pulsar and Kafka: unified queuing and streaming
Date Tue, 05 Dec 2017 18:59:24 GMT
> I believe a lot of users are using the kafka high level consumers, it is
> effectively an **unordered** messaging/streaming pattern. People using high
> level consumers don't actually need any ordering guarantees. In this sense,
> a *shared* subscription in Apache Pulsar seems to be better than current
> Kafka's consumer group model, as it allows the consumption rate not limited
> by the number of partitions, can actually grow beyond the number of
> partitions. We do see a lot of operational pain points on production coming
> from consumer lags, which I think it is very commonly seen during partition
> rebalancing in a consumer group. Selective acking seems to provide a finer
> granularity on acknowledgment, which can be actually good for avoiding
> consumer lags and avoid reprocessing messages during partition rebalance.


Yeah, I'm not sure about this. I'd be interested to understand the design
of this feature a little better. In practice, when ordering is unimportant,
adding partitions seems not too big of a deal. Also, I'm aware of active
efforts to make rebalancing less of a pain point for our users ;)

The last question, from users perspective, since both kafka and pulsar are
> distributed pub/sub messaging systems and both of them at the ASF, is there
> any possibility for these two projects to collaborate, e.g. kafka adopts
> pulsar's messaging model, pulsar can use kafka streams and kafka connect. I
> believe a lot of people in the mailing list might have same or similar
> question. From end-user perspective, if such collaboration can happen, that
> is going to great for users and also the ASF. I would like to hear any
> thoughts from kafka committers and pmc members.


I see this a little differently. Although there is some overlap between the
projects, they have quite different underlying philosophies (as Marina
alluded to) and I hope this will take them on different trajectories over
time. That would ultimately benefit users more than having two competing
projects solving all the same use cases. We don't need to try to cram
Pulsar features into Kafka if it's not a good fit and vice versa. At the
same time, where capabilities do overlap, we can try to learn from their
experience and they can learn from ours. The example of message retention
seemed like one of these instances since there are legitimate use cases and
Pulsar's approach has some benefits.


-Jason



On Tue, Dec 5, 2017 at 9:57 AM, Khurrum Nasim <khurrumnasimm@gmail.com>
wrote:

> Hi Marina,
>
>
> On Tue, Dec 5, 2017 at 6:58 AM, Marina Popova <ppine7sub@protonmail.com>
> wrote:
>
> > Hi,
> > I don't think it would be such a great idea to start modifying the very
> > foundation of Kafka's design to accommodate more and more extra use
> cases.
> > Kafka because so widely adopted and popular because its creator made a
> > brilliant decision to make it "dumb broker - smart consumer" type of the
> > system, where there is no to minimal dependencies between Kafka brokers
> and
> > Consumers. This is what make Kafka blazingly fast and truly scalable -
> able
> > to handle thousands of Consumers with no impact on performance.
> >
>
> I am not sure I agree with this. I think from end-user perspective, what
> users expect is a ultra simple streaming/messaging system: applications
> sends messages, messaging systems store and dispatch them, consumers
> consume the messages and tell the systems that they already consumed the
> messages. IMO whether a centralized management or decentralize management
> doesn't really matter here if kafka is able to do things without impacting
> performance.
>
> sometimes people assume that smarter brokers (like traditional messaging
> brokers) can not offer high throughput and scalability, because they do
> "too many things". but I took a look at Pulsar documentation and their
> presentation. There are a few metrics very impressive:
>
> https://image.slidesharecdn.com/apachepulsar-171113225233/
> 95/bdam-multitenant-and-georeplication-messaging-with-
> apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-
> 638.jpg?cb=1510613990
>
> <https://image.slidesharecdn.com/apachepulsar-171113225233/
> 95/bdam-multitenant-and-georeplication-messaging-with-
> apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-
> 638.jpg?cb=1510613990>-
> 1.8 million messages/second per topic partition
> - 99pct producing latency less than 5ms with stronger durability
> - support millions of topics
> - it also supports at-least-once and effectively-once producing
>
> Those metrics sound appealing to me if pulsar supports both streaming and
> queuing. I am wondering if anyone in the community tries to do a
> performance testing or benchmark between Pulsar and Kafka. I would love to
> see such results that can help people understand both systems, pros and
> cons.
>
>
> - KN
>
>
>
> >
> > One unfortunate consequence of becoming so popular - is that more and
> more
> > people are trying to fit Kafka into their architectures not because it
> > really fits, but because everybody else is doing so :) And this causes
> many
> > requests to support more and more reacher functionality to be added to
> > Kafka - like transactional messages, more complex acks, centralized
> > consumer management, etc.
> >
> > If you really need those feature - there are other systems that are
> > designed for that.
> >
> > I truly worry that if all those changes are added to Core Kafka - it will
> > become just another "do it all" enterprise-level monster that will be
> able
> > to do it all but at a price of mediocre performance and ten-fold
> increased
> > complexity (and, thus, management and possibility of bugs). Sure, there
> has
> > to be innovation and new features added - but maybe those that require
> > major changes to the Kafka's core principles should go into separate
> > frameworks, plug-ing (like Connectors) or something in that line, rather
> > that packing it all into the Core Kafka.
> >
> > Just my 2 cents :)
> >
> > Marina
> >
> > Sent with [ProtonMail](https://protonmail.com) Secure Email.
> >
> > > -------- Original Message --------
> > > Subject: Re: Comparing Pulsar and Kafka: unified queuing and streaming
> > > Local Time: December 4, 2017 2:56 PM
> > > UTC Time: December 4, 2017 7:56 PM
> > > From: jason@confluent.io
> > > To: dev@kafka.apache.org
> > > Kafka Users <users@kafka.apache.org>
> > >
> > > Hi Khurrum,
> > >
> > > Thanks for sharing the article. I think one interesting aspect of
> Pulsar
> > > that stands out to me is its notion of a subscription and how it
> impacts
> > > message retention. In Kafka, consumers are more loosely coupled and
> > > retention is enforced independently of consumption. There are some
> > > scenarios I can imagine where the tighter coupling might be beneficial.
> > For
> > > example, in Kafka Streams, we often use intermediate topics to store
> the
> > > data in one stage of the topology's computation. These topics are
> > > exclusively owned by the application and once the messages have been
> > > successfully received by the next stage, we do not need to retain them
> > > further. But since consumption is independent of retention, we either
> > have
> > > to choose a large retention time and deal with some temporary storage
> > waste
> > > or we use a low retention time and possibly lose some messages during
> an
> > > outage.
> > >
> > > We have solved this problem to some extent in Kafka by introducing an
> API
> > > to delete the records in a partition up to a certain offset, but this
> > > effectively puts the burden of this use case on clients. It would be
> > > interesting to consider whether we could do something like Pulsar in
> the
> > > Kafka broker. For example, we have a consumer group coordinator which
> is
> > > able to track the progress of the group through its committed offsets.
> It
> > > might be possible to extend it to automatically delete records in a
> topic
> > > after offsets are committed if the topic is known to be exclusively
> owned
> > > by the consumer group. We already have the DeleteRecords API that need,
> > so
> > > maybe this is "just" a matter of some additional topic metadata. I'd be
> > > interested to hear whether this kind of use case is common among our
> > users.
> > >
> > > -Jason
> > >
> > > On Sun, Dec 3, 2017 at 10:29 PM, Khurrum Nasim khurrumnasimm@gmail.com
> > > wrote:
> > >
> > >> Dear Kafka Community,
> > >> I happened to read this blog post comparing the messaging model
> between
> > >> Apache Pulsar and Apache Kafka. It sounds interesting. Apache Pulsar
> > claims
> > >> to unify streaming (kafka) and queuing (rabbitmq) in one unified API.
> > >> Pulsar also seems to support Kafka API. Have anyone taken a look at
> > Pulsar?
> > >> How does the community think about this? Pulsar is also an Apache
> > project.
> > >> Is there any collaboration can happen between these two projects?
> > >> https://streaml.io/blog/pulsar-streaming-queuing/
> > >> BTW, I am a Kafka user, loving Kafka a lot. Just try to see what other
> > >> people think about this.
> > >>
> > >> - KN
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message