kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Replication bandwidth double what is expected
Date Tue, 02 Sep 2014 20:44:36 GMT
Hi Theo,

You can try to set replica.fetch.min.bytes to some large number (default to
1) and increase replica.fetch.wait.max.ms (default to 500) and see if that
helps. In general, with 4 fetchers and min.bytes to 1 the replicas would
effectively exchange many small packets over the wire.

Guozhang


On Mon, Sep 1, 2014 at 11:06 PM, Theo Hultberg <theo@iconara.net> wrote:

> Hi Guozhang,
>
> We're using the default on all of those, except num.replica.fetchers which
> is set to 4.
>
> T#
>
>
> On Mon, Sep 1, 2014 at 9:41 PM, Guozhang Wang <wangguoz@gmail.com> wrote:
>
> > Hello Theo,
> >
> > What are the values for your "replica.fetch.max.bytes",
> > "replica.fetch.min.bytes", "replica.fetch.wait.max.ms" and
> > "num.replica.fetchers" configs?
> >
> > Guozhang
> >
> >
> > On Mon, Sep 1, 2014 at 2:52 AM, Theo Hultberg <theo@iconara.net> wrote:
> >
> > > Hi,
> > >
> > > We're evaluating Kafka, and have a problem with it using more bandwidth
> > > than we can explain. From what we can tell the replication uses at
> least
> > > twice the bandwidth it should.
> > >
> > > We have four producer nodes and three broker nodes. We have enabled 3x
> > > replication, so each node will get a copy of all data in this setup.
> The
> > > producers have Snappy compression enabled and send batches of 200
> > messages.
> > > The messages are around 1 KiB each. The cluster runs using mostly
> default
> > > configuration, and the Kafka version is 0.8.1.1.
> > >
> > > When we run iftop on the broker nodes we see that each Kafka node
> > receives
> > > around 6-7 Mbit from each producer node (or around 25-30 Mbit in
> total),
> > > but then sends around 50 Mbit to each other Kafka node (or 100 Mbit in
> > > total). This is twice what we expected to see, and it seems to saturate
> > the
> > > bandwidth on our m1.xlarge machines. In other words, we expected the
> > > incoming 25 Mbit to be amplified to 50 Mbit, not 100.
> > >
> > > One thing that could explain it, and that we don't really know how to
> > > verify, is that the inter-node communication is not compressed. We
> aren't
> > > sure about what compression ratio we get on the incoming data, but 50%
> > > sounds reasonable. Could this explain what we're seeing? Is there a
> > > configuration property to enable compression on the replication traffic
> > > that we've missed?
> > >
> > > yours
> > > Theo
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message