kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hawin Jiang <hawin.ji...@gmail.com>
Subject Re: Consumer that consumes only local partition?
Date Wed, 05 Aug 2015 00:43:34 GMT
Hi  Robert

Here is the kafka benchmark for your reference.
if you want to use Flink, Storm, Samza or Spark, the performance will be
going down.

821,557 records/sec(78.3 MB/sec)

https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines





Best regards
Hawin



On Tue, Aug 4, 2015 at 11:57 AM, Robert Metzger <rmetzger@apache.org> wrote:

> Sorry for the very late reply ...
>
> The performance issue was not caused by network latency. I had a job like
> this:
> FlinkKafkaConsumer --> someSimpleOperation --> FlinkKafkaProducer.
>
> I thought that our FlinkKafkaConsumer is slow, but actually our
> FlinkKafkaProducer was using the old producer API of Kafka. Switching to
> the new producer API of Kafka greatly improved our writing performance to
> Kafka. Flink was slowing down the KafkaConsumer because of the producer.
>
> Since we are already talking about performance, let me ask you the
> following question:
> I am using Kafka and Flink on a HDP 2.2 cluster (with 40 machines). What
> would you consider a good read/write performance for 8-byte messages on the
> following setup?
> - 40 brokers,
> - topic with 120 partitions
> - 120 reading threads (on 30 machines)
> - 120 writing threads (on 30 machines)
>
> I'm getting a write throughput of ~75k elements/core/second and a read
> throughput of ~50k el/c/s.
> When I'm stopping the writers, the read throughput goes up to 130k.
> I would expect a higher throughput than (8*75000) / 1024 = 585.9 kb/sec per
> partition .. or are the messages too small and the overhead is very high.
>
> Which system out there would you recommend for getting reference
> performance numbers? Samza, Spark, Storm?
>
>
> On Wed, Jul 15, 2015 at 7:20 PM, Gwen Shapira <gshapira@cloudera.com>
> wrote:
>
> > This is not something you can use the consumer API to simply do easily
> > (consumers don't have locality notion).
> > I can imagine using Kafka's low-level API calls to get a list of
> > partitions and the lead replica, figuring out which are local and
> > using those - but that sounds painful.
> >
> > Are you 100% sure the performance issue is due to network latency? If
> > not, you may want to start optimizing somewhere more productive :)
> > Kafka brokers and clients both have Metrics that may help you track
> > where the performance issues are coming from.
> >
> > Gwen
> >
> > On Wed, Jul 15, 2015 at 9:24 AM, Robert Metzger <rmetzger@apache.org>
> > wrote:
> > > Hi Shef,
> > >
> > > did you resolve this issue?
> > > I'm facing some performance issues and I was wondering whether reading
> > > locally would resolve them.
> > >
> > > On Mon, Jun 22, 2015 at 11:43 PM, Shef <shef31@yahoo.com> wrote:
> > >
> > >> Noob question here. I want to have a single consumer for each
> partition
> > >> that consumes only the messages that have been written locally. In
> other
> > >> words, I want the consumer to access the local disk and not pull
> > anything
> > >> across the network. Possible?
> > >>
> > >> How can I discover which partitions are local?
> > >>
> > >>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message