kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Stevenson <asteven...@outlook.com>
Subject Re: Comparing Pulsar and Kafka: unified queuing and streaming
Date Thu, 07 Dec 2017 17:52:11 GMT
Hi Khurrum,

Is ready now.
https://github.com/Landoop/stream-reactor

Regards

Andrew


From: Khurrum Nasim
Sent: Thursday, 7 December, 08:36
Subject: Re: Comparing Pulsar and Kafka: unified queuing and streaming
To: dev@kafka.apache.org
Cc: users@kafka.apache.org


Andrew, Thank you! Is there any estimation on when I can try out Kafka Connect with Pulsar?
Can you also point me when I can find the Kafka-to-Pulsar source and sink? - KN On Wed, Dec
6, 2017 at 2:48 AM, Andrew Stevenson wrote: > In terms of building out the Apache Pulsar
ecosystem, Landoop is working > on porting our Kafka Connect Connectors to Pulsars framework,
> We already have a Kafka to Pulsar source and sink. > > > On 05/12/2017, 19:59,
"Jason Gustafson" wrote: > > > I believe a lot of users are using the kafka high
level consumers, > it is > > effectively an **unordered** messaging/streaming pattern.
People > using high > > level consumers don't actually need any ordering guarantees.
In this > sense, > > a *shared* subscription in Apache Pulsar seems to be better
than > current > > Kafka's consumer group model, as it allows the consumption rate
not > limited > > by the number of partitions, can actually grow beyond the number
of > > partitions. We do see a lot of operational pain points on production > coming
> > from consumer lags, which I think it is very commonly seen during > partition
> > rebalancing in a consumer group. Selective acking seems to provide a > finer
> > granularity on acknowledgment, which can be actually good for > avoiding >
> consumer lags and avoid reprocessing messages during partition > rebalance. > >
> Yeah, I'm not sure about this. I'd be interested to understand the > design > of
this feature a little better. In practice, when ordering is > unimportant, > adding
partitions seems not too big of a deal. Also, I'm aware of > active > efforts to make
rebalancing less of a pain point for our users ;) > > The last question, from users
perspective, since both kafka and pulsar > are > > distributed pub/sub messaging
systems and both of them at the ASF, > is there > > any possibility for these two
projects to collaborate, e.g. kafka > adopts > > pulsar's messaging model, pulsar
can use kafka streams and kafka > connect. I > > believe a lot of people in the mailing
list might have same or > similar > > question. From end-user perspective, if such
collaboration can > happen, that > > is going to great for users and also the ASF.
I would like to hear > any > > thoughts from kafka committers and pmc members. >
> > I see this a little differently. Although there is some overlap > between the
> projects, they have quite different underlying philosophies (as Marina > alluded to)
and I hope this will take them on different trajectories > over > time. That would ultimately
benefit users more than having two > competing > projects solving all the same use cases.
We don't need to try to cram > Pulsar features into Kafka if it's not a good fit and vice
versa. At > the > same time, where capabilities do overlap, we can try to learn from
> their > experience and they can learn from ours. The example of message > retention
> seemed like one of these instances since there are legitimate use > cases and >
Pulsar's approach has some benefits. > > > -Jason > > > > On Tue, Dec
5, 2017 at 9:57 AM, Khurrum Nasim > > wrote: > > > Hi Marina, > > >
> > > On Tue, Dec 5, 2017 at 6:58 AM, Marina Popova < > ppine7sub@protonmail.com>
> > wrote: > > > > > Hi, > > > I don't think it would be such
a great idea to start modifying the > very > > > foundation of Kafka's design
to accommodate more and more extra use > > cases. > > > Kafka because so widely
adopted and popular because its creator > made a > > > brilliant decision to make
it "dumb broker - smart consumer" type > of the > > > system, where there is no
to minimal dependencies between Kafka > brokers > > and > > > Consumers.
This is what make Kafka blazingly fast and truly > scalable - > > able > >
> to handle thousands of Consumers with no impact on performance. > > > > >
> > I am not sure I agree with this. I think from end-user perspective, > what >
> users expect is a ultra simple streaming/messaging system: > applications > >
sends messages, messaging systems store and dispatch them, consumers > > consume the
messages and tell the systems that they already consumed > the > > messages. IMO
whether a centralized management or decentralize > management > > doesn't really
matter here if kafka is able to do things without > impacting > > performance. >
> > > sometimes people assume that smarter brokers (like traditional > messaging
> > brokers) can not offer high throughput and scalability, because they > do >
> "too many things". but I took a look at Pulsar documentation and > their > >
presentation. There are a few metrics very impressive: > > > > https://image.slidesharecdn.com/apachepulsar-171113225233/
> > 95/bdam-multitenant-and-georeplication-messaging-with- > > apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2-
> > 638.jpg?cb=1510613990 > > > > > 95/bdam-multitenant-and-georeplication-messaging-with-
> > apache-pulsar-by-matteo-merli-sijie-guo-from-streamlio-2- > > 638.jpg?cb=1510613990>-
> > 1.8 million messages/second per topic partition > > - 99pct producing latency
less than 5ms with stronger durability > > - support millions of topics > > -
it also supports at-least-once and effectively-once producing > > > > Those metrics
sound appealing to me if pulsar supports both > streaming and > > queuing. I am wondering
if anyone in the community tries to do a > > performance testing or benchmark between
Pulsar and Kafka. I would > love to > > see such results that can help people understand
both systems, pros > and > > cons. > > > > > > - KN > > >
> > > > > > > > > One unfortunate consequence of becoming so popular
- is that more > and > > more > > > people are trying to fit Kafka into
their architectures not > because it > > > really fits, but because everybody
else is doing so :) And this > causes > > many > > > requests to support
more and more reacher functionality to be > added to > > > Kafka - like transactional
messages, more complex acks, centralized > > > consumer management, etc. > >
> > > > If you really need those feature - there are other systems that are >
> > designed for that. > > > > > > I truly worry that if all those
changes are added to Core Kafka - > it will > > > become just another "do it all"
enterprise-level monster that will > be > > able > > > to do it all but
at a price of mediocre performance and ten-fold > > increased > > > complexity
(and, thus, management and possibility of bugs). Sure, > there > > has > >
> to be innovation and new features added - but maybe those that > require > >
> major changes to the Kafka's core principles should go into > separate > > >
frameworks, plug-ing (like Connectors) or something in that line, > rather > > >
that packing it all into the Core Kafka. > > > > > > Just my 2 cents :)
> > > > > > Marina > > > > > > Sent with [ProtonMail](https://protonmail.com)
Secure Email. > > > > > > > -------- Original Message -------- > >
> > Subject: Re: Comparing Pulsar and Kafka: unified queuing and > streaming >
> > > Local Time: December 4, 2017 2:56 PM > > > > UTC Time: December
4, 2017 7:56 PM > > > > From: jason@confluent.io > > > > To: dev@kafka.apache.org
> > > > Kafka Users > > > > > > > > Hi Khurrum, > >
> > > > > > Thanks for sharing the article. I think one interesting aspect
of > > Pulsar > > > > that stands out to me is its notion of a subscription
and how it > > impacts > > > > message retention. In Kafka, consumers are
more loosely coupled > and > > > > retention is enforced independently of consumption.
There are > some > > > > scenarios I can imagine where the tighter coupling
might be > beneficial. > > > For > > > > example, in Kafka Streams,
we often use intermediate topics to > store > > the > > > > data in one
stage of the topology's computation. These topics are > > > > exclusively owned
by the application and once the messages have > been > > > > successfully received
by the next stage, we do not need to > retain them > > > > further. But since
consumption is independent of retention, we > either > > > have > > >
> to choose a large retention time and deal with some temporary > storage > >
> waste > > > > or we use a low retention time and possibly lose some messages
> during > > an > > > > outage. > > > > > > > >
We have solved this problem to some extent in Kafka by > introducing an > > API >
> > > to delete the records in a partition up to a certain offset, but > this
> > > > effectively puts the burden of this use case on clients. It > would
be > > > > interesting to consider whether we could do something like > Pulsar
in > > the > > > > Kafka broker. For example, we have a consumer group coordinator
> which > > is > > > > able to track the progress of the group through
its committed > offsets. > > It > > > > might be possible to extend it
to automatically delete records > in a > > topic > > > > after offsets
are committed if the topic is known to be > exclusively > > owned > > >
> by the consumer group. We already have the DeleteRecords API > that need, > >
> so > > > > maybe this is "just" a matter of some additional topic metadata.
> I'd be > > > > interested to hear whether this kind of use case is common
among > our > > > users. > > > > > > > > -Jason > >
> > > > > > On Sun, Dec 3, 2017 at 10:29 PM, Khurrum Nasim > khurrumnasimm@gmail.com
> > > > wrote: > > > > > > > >> Dear Kafka Community,
> > > >> I happened to read this blog post comparing the messaging model >
> between > > > >> Apache Pulsar and Apache Kafka. It sounds interesting.
Apache > Pulsar > > > claims > > > >> to unify streaming (kafka)
and queuing (rabbitmq) in one > unified API. > > > >> Pulsar also seems
to support Kafka API. Have anyone taken a > look at > > > Pulsar? > > >
>> How does the community think about this? Pulsar is also an > Apache > >
> project. > > > >> Is there any collaboration can happen between these
two > projects? > > > >> https://streaml.io/blog/pulsar-streaming-queuing/
> > > >> BTW, I am a Kafka user, loving Kafka a lot. Just try to see > what
other > > > >> people think about this. > > > >> > > >
>> - KN > > > > > > > > >


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message