kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurrum Nasim <khurrumnas...@gmail.com>
Subject Re: Comparing Pulsar and Kafka: unified queuing and streaming
Date Thu, 07 Dec 2017 07:36:10 GMT
Andrew,

Thank you! Is there any estimation on when I can try out Kafka Connect with
Pulsar?

Can you also point me when I can find the Kafka-to-Pulsar source and sink?

- KN

On Wed, Dec 6, 2017 at 2:48 AM, Andrew Stevenson <andrew@landoop.com> wrote:

> In terms of building out the Apache Pulsar ecosystem, Landoop is working
> on porting our Kafka Connect Connectors to Pulsars framework,
> We already have a Kafka to Pulsar source and sink.
>
>
> On 05/12/2017, 19:59, "Jason Gustafson" <jason@confluent.io> wrote:
>
>     > I believe a lot of users are using the kafka high level consumers,
> it is
>     > effectively an **unordered** messaging/streaming pattern. People
> using high
>     > level consumers don't actually need any ordering guarantees. In this
> sense,
>     > a *shared* subscription in Apache Pulsar seems to be better than
> current
>     > Kafka's consumer group model, as it allows the consumption rate not
> limited
>     > by the number of partitions, can actually grow beyond the number of
>     > partitions. We do see a lot of operational pain points on production
> coming
>     > from consumer lags, which I think it is very commonly seen during
> partition
>     > rebalancing in a consumer group. Selective acking seems to provide a
> finer
>     > granularity on acknowledgment, which can be actually good for
> avoiding
>     > consumer lags and avoid reprocessing messages during partition
> rebalance.
>
>
>     Yeah, I'm not sure about this. I'd be interested to understand the
> design
>     of this feature a little better. In practice, when ordering is
> unimportant,
>     adding partitions seems not too big of a deal. Also, I'm aware of
> active
>     efforts to make rebalancing less of a pain point for our users ;)
>
>     The last question, from users perspective, since both kafka and pulsar
> are
>     > distributed pub/sub messaging systems and both of them at the ASF,
> is there
>     > any possibility for these two projects to collaborate, e.g. kafka
> adopts
>     > pulsar's messaging model, pulsar can use kafka streams and kafka
> connect. I
>     > believe a lot of people in the mailing list might have same or
> similar
>     > question. From end-user perspective, if such collaboration can
> happen, that
>     > is going to great for users and also the ASF. I would like to hear
> any
>     > thoughts from kafka committers and pmc members.
>
>
>     I see this a little differently. Although there is some overlap
> between the
>     projects, they have quite different underlying philosophies (as Marina
>     alluded to) and I hope this will take them on different trajectories
> over
>     time. That would ultimately benefit users more than having two
> competing
>     projects solving all the same use cases. We don't need to try to cram
>     Pulsar features into Kafka if it's not a good fit and vice versa. At
> the
>     same time, where capabilities do overlap, we can try to learn from
> their
>     experience and they can learn from ours. The example of message
> retention
>     seemed like one of these instances since there are legitimate use
> cases and
>     Pulsar's approach has some benefits.
>
>
>     -Jason
>
>
>
>     On Tue, Dec 5, 2017 at 9:57 AM, Khurrum Nasim <khurrumnasimm@gmail.com
> >
>     wrote:
>
>     > Hi Marina,
>     >
>     >
>     > On Tue, Dec 5, 2017 at 6:58 AM, Marina Popova <
> ppine7sub@protonmail.com>
>     > wrote:
>     >
>     > > Hi,
>     > > I don't think it would be such a great idea to start modifying the
> very
>     > > foundation of Kafka's design to accommodate more and more extra use
>     > cases.
>     > > Kafka because so widely adopted and popular because its creator
> made a
>     > > brilliant decision to make it "dumb broker - smart consumer" type
> of the
>     > > system, where there is no to minimal dependencies between Kafka
> brokers
>     > and
>     > > Consumers. This is what make Kafka blazingly fast and truly
> scalable -
>     > able
>     > > to handle thousands of Consumers with no impact on performance.
>     > >
>     >
>     > I am not sure I agree with this. I think from end-user perspective,
> what
>     > users expect is a ultra simple streaming/messaging system:
> applications
>     > sends messages, messaging systems store and dispatch them, consumers
>     > consume the messages and tell the systems that they already consumed
> the
>     > messages. IMO whether a centralized management or decentralize
> management
>     > doesn't really matter here if kafka is able to do things without
> impacting
>     > performance.
>     >
>     > sometimes people assume that smarter brokers (like traditional
> messaging
>     > brokers) can not offer high throughput and scalability, because they
> do
>     > "too many things". but I took a look at Pulsar documentation and
> their
>     > presentation. There are a few metrics very impressive:
>     >
>     > https://image.slidesharecdn.com/apachepulsar-171113225233/
>     > 95/bdam-multitenant-and-georeplication-messaging-with-
>     > apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-
>     > 638.jpg?cb=1510613990
>     >
>     > <https://image.slidesharecdn.com/apachepulsar-171113225233/
>     > 95/bdam-multitenant-and-georeplication-messaging-with-
>     > apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-
>     > 638.jpg?cb=1510613990>-
>     > 1.8 million messages/second per topic partition
>     > - 99pct producing latency less than 5ms with stronger durability
>     > - support millions of topics
>     > - it also supports at-least-once and effectively-once producing
>     >
>     > Those metrics sound appealing to me if pulsar supports both
> streaming and
>     > queuing. I am wondering if anyone in the community tries to do a
>     > performance testing or benchmark between Pulsar and Kafka. I would
> love to
>     > see such results that can help people understand both systems, pros
> and
>     > cons.
>     >
>     >
>     > - KN
>     >
>     >
>     >
>     > >
>     > > One unfortunate consequence of becoming so popular - is that more
> and
>     > more
>     > > people are trying to fit Kafka into their architectures not
> because it
>     > > really fits, but because everybody else is doing so :) And this
> causes
>     > many
>     > > requests to support more and more reacher functionality to be
> added to
>     > > Kafka - like transactional messages, more complex acks, centralized
>     > > consumer management, etc.
>     > >
>     > > If you really need those feature - there are other systems that are
>     > > designed for that.
>     > >
>     > > I truly worry that if all those changes are added to Core Kafka -
> it will
>     > > become just another "do it all" enterprise-level monster that will
> be
>     > able
>     > > to do it all but at a price of mediocre performance and ten-fold
>     > increased
>     > > complexity (and, thus, management and possibility of bugs). Sure,
> there
>     > has
>     > > to be innovation and new features added - but maybe those that
> require
>     > > major changes to the Kafka's core principles should go into
> separate
>     > > frameworks, plug-ing (like Connectors) or something in that line,
> rather
>     > > that packing it all into the Core Kafka.
>     > >
>     > > Just my 2 cents :)
>     > >
>     > > Marina
>     > >
>     > > Sent with [ProtonMail](https://protonmail.com) Secure Email.
>     > >
>     > > > -------- Original Message --------
>     > > > Subject: Re: Comparing Pulsar and Kafka: unified queuing and
> streaming
>     > > > Local Time: December 4, 2017 2:56 PM
>     > > > UTC Time: December 4, 2017 7:56 PM
>     > > > From: jason@confluent.io
>     > > > To: dev@kafka.apache.org
>     > > > Kafka Users <users@kafka.apache.org>
>     > > >
>     > > > Hi Khurrum,
>     > > >
>     > > > Thanks for sharing the article. I think one interesting aspect of
>     > Pulsar
>     > > > that stands out to me is its notion of a subscription and how it
>     > impacts
>     > > > message retention. In Kafka, consumers are more loosely coupled
> and
>     > > > retention is enforced independently of consumption. There are
> some
>     > > > scenarios I can imagine where the tighter coupling might be
> beneficial.
>     > > For
>     > > > example, in Kafka Streams, we often use intermediate topics to
> store
>     > the
>     > > > data in one stage of the topology's computation. These topics are
>     > > > exclusively owned by the application and once the messages have
> been
>     > > > successfully received by the next stage, we do not need to
> retain them
>     > > > further. But since consumption is independent of retention, we
> either
>     > > have
>     > > > to choose a large retention time and deal with some temporary
> storage
>     > > waste
>     > > > or we use a low retention time and possibly lose some messages
> during
>     > an
>     > > > outage.
>     > > >
>     > > > We have solved this problem to some extent in Kafka by
> introducing an
>     > API
>     > > > to delete the records in a partition up to a certain offset, but
> this
>     > > > effectively puts the burden of this use case on clients. It
> would be
>     > > > interesting to consider whether we could do something like
> Pulsar in
>     > the
>     > > > Kafka broker. For example, we have a consumer group coordinator
> which
>     > is
>     > > > able to track the progress of the group through its committed
> offsets.
>     > It
>     > > > might be possible to extend it to automatically delete records
> in a
>     > topic
>     > > > after offsets are committed if the topic is known to be
> exclusively
>     > owned
>     > > > by the consumer group. We already have the DeleteRecords API
> that need,
>     > > so
>     > > > maybe this is "just" a matter of some additional topic metadata.
> I'd be
>     > > > interested to hear whether this kind of use case is common among
> our
>     > > users.
>     > > >
>     > > > -Jason
>     > > >
>     > > > On Sun, Dec 3, 2017 at 10:29 PM, Khurrum Nasim
> khurrumnasimm@gmail.com
>     > > > wrote:
>     > > >
>     > > >> Dear Kafka Community,
>     > > >> I happened to read this blog post comparing the messaging model
>     > between
>     > > >> Apache Pulsar and Apache Kafka. It sounds interesting. Apache
> Pulsar
>     > > claims
>     > > >> to unify streaming (kafka) and queuing (rabbitmq) in one
> unified API.
>     > > >> Pulsar also seems to support Kafka API. Have anyone taken a
> look at
>     > > Pulsar?
>     > > >> How does the community think about this? Pulsar is also an
> Apache
>     > > project.
>     > > >> Is there any collaboration can happen between these two
> projects?
>     > > >> https://streaml.io/blog/pulsar-streaming-queuing/
>     > > >> BTW, I am a Kafka user, loving Kafka a lot. Just try to see
> what other
>     > > >> people think about this.
>     > > >>
>     > > >> - KN
>     > >
>     >
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message