kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Roesler <j...@confluent.io>
Subject Re: Low level kafka consumer API to KafkaStreams App.
Date Mon, 17 Sep 2018 17:10:14 GMT
Hey Praveen,

I also suspect that you can get away with far fewer threads. Here's the
general starting point I recommend:

* start with just a little over 1 thread per hardware thread (accounting
for cores and hyperthreading). For example, on my machine, I have 4 cores
with 2 threads of execution each, so I would configure the application with
8 or maybe 9 threads. Much more than that introduces a *lot* of CPU/memory
overhead in exchange for not much gain (if any).
* choose a number of partitions that would allow you to scale up to a
reasonable number of machines, with respect to the numbers you get above.

>From there, take a close look at all your important machine metrics (cpu,
memory, disk, network) as well as processing metrics (task throughput (how
long your application code takes), end-to-end processing throughput (how
long the full processing lifecycle takes, including the broker roundtrips)).

If there's any resource not saturated, you can tweak various configurations
to try and saturate it. I would think that stuff like buffer size and batch
size would be more helpful with less overhead than number of threads.

But keep a close look at your throughputs each time you make a change, to
be sure you're not locally optimizing at the expense of global performance.

I hope this helps!
-John

On Thu, Sep 13, 2018 at 4:53 PM Svante Karlsson <svante.karlsson@csi.se>
wrote:

> You are doing something wrong if you need 10k threads to produce 800k
> messages per second. It feels you are a factor of 1000 off. What size are
> your messages?
>
> On Thu, Sep 13, 2018, 21:04 Praveen <praveev.gk@gmail.com> wrote:
>
> > Hi there,
> >
> > I have a kafka application that uses kafka consumer low-level api to help
> > us process data from a single partition concurrently. Our use case is to
> > send out 800k messages per sec. We are able to do that with 4 boxes using
> > 10k threads and each request taking 50ms in a thread. (1000/50*10000*4)
> >
> > I understand that kafka in general uses partitions as its parallelism
> > model. It is my understanding that if I want the exact same behavior with
> > kafka streams, I'd need to create 40k partitions for this topic. Is that
> > right?
> >
> > What is the overhead on creating thousands of partitions? If we end up
> > wanting to send out millions of messages per second, is increasing the
> > partitions the only way?
> >
> > Best,
> > Praveen
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message