kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurrum Nasim <khurrumnas...@gmail.com>
Subject Re: Comparing Pulsar and Kafka: unified queuing and streaming
Date Wed, 06 Dec 2017 03:50:34 GMT
Jason,

Comments inline.

On Tue, Dec 5, 2017 at 10:59 AM, Jason Gustafson <jason@confluent.io> wrote:

> > I believe a lot of users are using the kafka high level consumers, it is
> > effectively an **unordered** messaging/streaming pattern. People using
> high
> > level consumers don't actually need any ordering guarantees. In this
> sense,
> > a *shared* subscription in Apache Pulsar seems to be better than current
> > Kafka's consumer group model, as it allows the consumption rate not
> limited
> > by the number of partitions, can actually grow beyond the number of
> > partitions. We do see a lot of operational pain points on production
> coming
> > from consumer lags, which I think it is very commonly seen during
> partition
> > rebalancing in a consumer group. Selective acking seems to provide a
> finer
> > granularity on acknowledgment, which can be actually good for avoiding
> > consumer lags and avoid reprocessing messages during partition rebalance.
>
>
> Yeah, I'm not sure about this. I'd be interested to understand the design
> of this feature a little better. In practice, when ordering is unimportant,
> adding partitions seems not too big of a deal.


I think it depends. You probably can address the problem by adding more
partitions, if the topic is only used by one consume group exclusive.

However it still have pain points as follows:

- in a shared organization, a topic might be shared between multiple teams.
sometimes it is really hard or not simple to increase partitions for a
topic. especially if some team wants to consume messages in order.
- even say you can easily increase the number of partitions, but it doesn't
address the consumer lag issue. because without selective acking, some of
the *acknowledged* or *processed* messages will be  redelivered again after
partitions are bounced to other consumers.



> Also, I'm aware of active
> efforts to make rebalancing less of a pain point for our users ;)
>

Can you point me the KIPs of these efforts? would love to keep an eye on
them.


>
> The last question, from users perspective, since both kafka and pulsar are
> > distributed pub/sub messaging systems and both of them at the ASF, is
> there
> > any possibility for these two projects to collaborate, e.g. kafka adopts
> > pulsar's messaging model, pulsar can use kafka streams and kafka
> connect. I
> > believe a lot of people in the mailing list might have same or similar
> > question. From end-user perspective, if such collaboration can happen,
> that
> > is going to great for users and also the ASF. I would like to hear any
> > thoughts from kafka committers and pmc members.
>
>
> I see this a little differently. Although there is some overlap between the
> projects, they have quite different underlying philosophies (as Marina
> alluded to) and I hope this will take them on different trajectories over
> time. That would ultimately benefit users more than having two competing
> projects solving all the same use cases. We don't need to try to cram
> Pulsar features into Kafka if it's not a good fit and vice versa. At the
> same time, where capabilities do overlap, we can try to learn from their
> experience and they can learn from ours. The example of message retention
> seemed like one of these instances since there are legitimate use cases and
> Pulsar's approach has some benefits.
>

sure. make sense to me.

btw, have you guys taken a look at pulsar's kafka API? I am wondering how
do you guys think about this.

- KN


>
>
> -Jason
>
>
>
> On Tue, Dec 5, 2017 at 9:57 AM, Khurrum Nasim <khurrumnasimm@gmail.com>
> wrote:
>
> > Hi Marina,
> >
> >
> > On Tue, Dec 5, 2017 at 6:58 AM, Marina Popova <ppine7sub@protonmail.com>
> > wrote:
> >
> > > Hi,
> > > I don't think it would be such a great idea to start modifying the very
> > > foundation of Kafka's design to accommodate more and more extra use
> > cases.
> > > Kafka because so widely adopted and popular because its creator made a
> > > brilliant decision to make it "dumb broker - smart consumer" type of
> the
> > > system, where there is no to minimal dependencies between Kafka brokers
> > and
> > > Consumers. This is what make Kafka blazingly fast and truly scalable -
> > able
> > > to handle thousands of Consumers with no impact on performance.
> > >
> >
> > I am not sure I agree with this. I think from end-user perspective, what
> > users expect is a ultra simple streaming/messaging system: applications
> > sends messages, messaging systems store and dispatch them, consumers
> > consume the messages and tell the systems that they already consumed the
> > messages. IMO whether a centralized management or decentralize management
> > doesn't really matter here if kafka is able to do things without
> impacting
> > performance.
> >
> > sometimes people assume that smarter brokers (like traditional messaging
> > brokers) can not offer high throughput and scalability, because they do
> > "too many things". but I took a look at Pulsar documentation and their
> > presentation. There are a few metrics very impressive:
> >
> > https://image.slidesharecdn.com/apachepulsar-171113225233/
> > 95/bdam-multitenant-and-georeplication-messaging-with-
> > apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-
> > 638.jpg?cb=1510613990
> >
> > <https://image.slidesharecdn.com/apachepulsar-171113225233/
> > 95/bdam-multitenant-and-georeplication-messaging-with-
> > apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-
> > 638.jpg?cb=1510613990>-
> > 1.8 million messages/second per topic partition
> > - 99pct producing latency less than 5ms with stronger durability
> > - support millions of topics
> > - it also supports at-least-once and effectively-once producing
> >
> > Those metrics sound appealing to me if pulsar supports both streaming and
> > queuing. I am wondering if anyone in the community tries to do a
> > performance testing or benchmark between Pulsar and Kafka. I would love
> to
> > see such results that can help people understand both systems, pros and
> > cons.
> >
> >
> > - KN
> >
> >
> >
> > >
> > > One unfortunate consequence of becoming so popular - is that more and
> > more
> > > people are trying to fit Kafka into their architectures not because it
> > > really fits, but because everybody else is doing so :) And this causes
> > many
> > > requests to support more and more reacher functionality to be added to
> > > Kafka - like transactional messages, more complex acks, centralized
> > > consumer management, etc.
> > >
> > > If you really need those feature - there are other systems that are
> > > designed for that.
> > >
> > > I truly worry that if all those changes are added to Core Kafka - it
> will
> > > become just another "do it all" enterprise-level monster that will be
> > able
> > > to do it all but at a price of mediocre performance and ten-fold
> > increased
> > > complexity (and, thus, management and possibility of bugs). Sure, there
> > has
> > > to be innovation and new features added - but maybe those that require
> > > major changes to the Kafka's core principles should go into separate
> > > frameworks, plug-ing (like Connectors) or something in that line,
> rather
> > > that packing it all into the Core Kafka.
> > >
> > > Just my 2 cents :)
> > >
> > > Marina
> > >
> > > Sent with [ProtonMail](https://protonmail.com) Secure Email.
> > >
> > > > -------- Original Message --------
> > > > Subject: Re: Comparing Pulsar and Kafka: unified queuing and
> streaming
> > > > Local Time: December 4, 2017 2:56 PM
> > > > UTC Time: December 4, 2017 7:56 PM
> > > > From: jason@confluent.io
> > > > To: dev@kafka.apache.org
> > > > Kafka Users <users@kafka.apache.org>
> > > >
> > > > Hi Khurrum,
> > > >
> > > > Thanks for sharing the article. I think one interesting aspect of
> > Pulsar
> > > > that stands out to me is its notion of a subscription and how it
> > impacts
> > > > message retention. In Kafka, consumers are more loosely coupled and
> > > > retention is enforced independently of consumption. There are some
> > > > scenarios I can imagine where the tighter coupling might be
> beneficial.
> > > For
> > > > example, in Kafka Streams, we often use intermediate topics to store
> > the
> > > > data in one stage of the topology's computation. These topics are
> > > > exclusively owned by the application and once the messages have been
> > > > successfully received by the next stage, we do not need to retain
> them
> > > > further. But since consumption is independent of retention, we either
> > > have
> > > > to choose a large retention time and deal with some temporary storage
> > > waste
> > > > or we use a low retention time and possibly lose some messages during
> > an
> > > > outage.
> > > >
> > > > We have solved this problem to some extent in Kafka by introducing an
> > API
> > > > to delete the records in a partition up to a certain offset, but this
> > > > effectively puts the burden of this use case on clients. It would be
> > > > interesting to consider whether we could do something like Pulsar in
> > the
> > > > Kafka broker. For example, we have a consumer group coordinator which
> > is
> > > > able to track the progress of the group through its committed
> offsets.
> > It
> > > > might be possible to extend it to automatically delete records in a
> > topic
> > > > after offsets are committed if the topic is known to be exclusively
> > owned
> > > > by the consumer group. We already have the DeleteRecords API that
> need,
> > > so
> > > > maybe this is "just" a matter of some additional topic metadata. I'd
> be
> > > > interested to hear whether this kind of use case is common among our
> > > users.
> > > >
> > > > -Jason
> > > >
> > > > On Sun, Dec 3, 2017 at 10:29 PM, Khurrum Nasim
> khurrumnasimm@gmail.com
> > > > wrote:
> > > >
> > > >> Dear Kafka Community,
> > > >> I happened to read this blog post comparing the messaging model
> > between
> > > >> Apache Pulsar and Apache Kafka. It sounds interesting. Apache Pulsar
> > > claims
> > > >> to unify streaming (kafka) and queuing (rabbitmq) in one unified
> API.
> > > >> Pulsar also seems to support Kafka API. Have anyone taken a look at
> > > Pulsar?
> > > >> How does the community think about this? Pulsar is also an Apache
> > > project.
> > > >> Is there any collaboration can happen between these two projects?
> > > >> https://streaml.io/blog/pulsar-streaming-queuing/
> > > >> BTW, I am a Kafka user, loving Kafka a lot. Just try to see what
> other
> > > >> people think about this.
> > > >>
> > > >> - KN
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message