kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Consumer that consumes only local partition?
Date Tue, 04 Aug 2015 18:57:47 GMT
Sorry for the very late reply ...

The performance issue was not caused by network latency. I had a job like
this:
FlinkKafkaConsumer --> someSimpleOperation --> FlinkKafkaProducer.

I thought that our FlinkKafkaConsumer is slow, but actually our
FlinkKafkaProducer was using the old producer API of Kafka. Switching to
the new producer API of Kafka greatly improved our writing performance to
Kafka. Flink was slowing down the KafkaConsumer because of the producer.

Since we are already talking about performance, let me ask you the
following question:
I am using Kafka and Flink on a HDP 2.2 cluster (with 40 machines). What
would you consider a good read/write performance for 8-byte messages on the
following setup?
- 40 brokers,
- topic with 120 partitions
- 120 reading threads (on 30 machines)
- 120 writing threads (on 30 machines)

I'm getting a write throughput of ~75k elements/core/second and a read
throughput of ~50k el/c/s.
When I'm stopping the writers, the read throughput goes up to 130k.
I would expect a higher throughput than (8*75000) / 1024 = 585.9 kb/sec per
partition .. or are the messages too small and the overhead is very high.

Which system out there would you recommend for getting reference
performance numbers? Samza, Spark, Storm?


On Wed, Jul 15, 2015 at 7:20 PM, Gwen Shapira <gshapira@cloudera.com> wrote:

> This is not something you can use the consumer API to simply do easily
> (consumers don't have locality notion).
> I can imagine using Kafka's low-level API calls to get a list of
> partitions and the lead replica, figuring out which are local and
> using those - but that sounds painful.
>
> Are you 100% sure the performance issue is due to network latency? If
> not, you may want to start optimizing somewhere more productive :)
> Kafka brokers and clients both have Metrics that may help you track
> where the performance issues are coming from.
>
> Gwen
>
> On Wed, Jul 15, 2015 at 9:24 AM, Robert Metzger <rmetzger@apache.org>
> wrote:
> > Hi Shef,
> >
> > did you resolve this issue?
> > I'm facing some performance issues and I was wondering whether reading
> > locally would resolve them.
> >
> > On Mon, Jun 22, 2015 at 11:43 PM, Shef <shef31@yahoo.com> wrote:
> >
> >> Noob question here. I want to have a single consumer for each partition
> >> that consumes only the messages that have been written locally. In other
> >> words, I want the consumer to access the local disk and not pull
> anything
> >> across the network. Possible?
> >>
> >> How can I discover which partitions are local?
> >>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message