samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@apache.org>
Subject Re: container concurrency and pipelining
Date Fri, 06 Feb 2015 16:34:29 GMT
Hey Jordan,

> I peaked out a single Samza container's consumer at around 2MB/s.

Could you post your configs, and version of Samza that you're running?

> Running a Kafka Consumer Perf test though on the same machine I can do
100's of MB/s.

How many threads were you running? Also, you're saying "consumer perf"
here. Consumer and producer exhibit very different throughput
characteristics. Can you describe (or post) the two tests that you did?

> It seems like most of the bottleneck exists in the Kafka async client.

Yes, this is what we've observed as well.

> A reasonable solution might be to just add partitions and increase
container count with the partition count.

This is usually the guidance that we give. If you have 8 cores, and want to
max out your machine, you should run 8 containers.

> Has there been any design discussions into allowing multiple cores on on
a single container to allow better pipelining within the container?

The discussion pretty much is what you've just described. We never felt
that the increase in code complexity, configs, mental model was worth the
trade-off. My argument is that we should make the Kafka producer go faster
(see comments below), rather than increasing complexity in Samza to get
around it.

> I also know that Kafka has plans to rework their producer but I haven't
been able to find if this includes introducing a thread pool to allow
multiple async produces.

We have upgraded Samza to the new producer in SAMZA-227. The code changes
are on master now. You should definitely check that out.

The new Kafka producer works as follows: there is one "sender" thread. When
you send messages, the messages get queued up, and the sender thread takes
them off the queue, and sends them to Kafka. One trick with the new
producer is that they are using NIO, and allow for pipelining. This is
*specifically* to address the point you made about those that care more
about throughput than ordering guarantees. The config of interest to you is:

  max.in.flight.requests.per.connection

This defines how many parallel sends can be pipelined (over one socket, in
the sender thread) before the send thread blocks. Samza forces this to 1
right now (because we wanted to guarantee ordering). It seems like a
reasonable request to allow users to over-ride this with their own setting
if they want more parallelism. Could you open a JIRA for that?

I should note, in smoke tests, with max-in-flight set to one in Samza, the
perf seemed roughly on-par with the Samza running the old Kafka producer. I
also spoke to Jay at the last Kafka meetup, and he mentioned that they
don't see much of a performance boost when running max-in-flight > 1. Jun
did some perf comparison between the old and new Kafka producer, and put
the information on some slides that he presented at the meetup. If you're
interested, you should ping them on the Kafka mailing list.

> Lastly, has anyone been able to get more MB/s out of a container than
what I have?

Thus far, I (personally) haven't spent much time on producer-side
optimization, so I don't have hard numbers on it. Our producer code is
pretty thin, so we're pretty much bound to what the Kafka producer can
do.If you're up for it, you might want to contribute something to:

  https://issues.apache.org/jira/browse/SAMZA-6

Here's what I'd recommend:

0. Write something reproducible and post it on SAMZA-6. For bonus points,
write an equivalent raw-Kafka-producer test (no Samza) so we can compare
them.
1. Checkout master.
2. Modify master to allow you to configure max-in-flights > 1 (line 185 of
KafkaConfig.scala).
3. Try setting acks to 0 (it's 1 by default).

Try running your tests at every one of these steps, and see how it affects
performance. If you get to 3, and things are still slow, we can loop in
some Kakfa-dev folks.

Cheers,
Chris

On Fri, Feb 6, 2015 at 12:00 AM, Jordan Shaw <jordan@pubnub.com> wrote:

> Hi everyone,
> I've done some raw Disk, Kafka and Samza benchmarking. I peaked out a
> single Samza container's consumer at around 2MB/s. Running a Kafka Consumer
> Perf test though on the same machine I can do 100's of MB/s. It seems like
> most of the bottleneck exists in the Kafka async client. There appears to
> be only 1 thread in the Kafka client rather than a thread pool and due to
> the limitation that a container can't run on multiple cores this thread
> gets scheduled I assume on the same core as the consumer and process call.
>
> I know a lot thought has been put into the design of maintaining parity
> between task instances and partitions and preventing unpredictable behavior
> from a threaded system. A reasonable solution might be to just add
> partitions and increase container count with the partition count. This is
> at the cost of increasing memory usage on the node managers necessarily due
> to the increased container count.
>
> Has there been any design discussions into allowing multiple cores on on a
> single container to allow better pipelining within the container to get
> better throughput and also introducing a thread pool outside of Kafka's
> client to allow concurrent produces to Kafka within the same container? I
> understand there are ordering concerns with this concurrency and for those
> sensitive use cases the thread pool could be 1 but for use cases where
> ordering is less important and raw throughput is more of a concern they can
> achieve that with allowing current async produces. I also know that Kafka
> has plans to rework their producer but I haven't been able to find if this
> includes introducing a thread pool to allow multiple async produces.
> Lastly, has anyone been able to get more MB/s out of a container than what
> I have? Thanks!
>
> --
> Jordan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message