kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <jun...@gmail.com>
Subject Re: Weird behaviour of Kafka consumer/mirrormaker at different DCs
Date Thu, 15 May 2014 03:32:42 GMT
To amortize the long latency across DC, you may need to tune the socket
buffer size to get higher throughput. See
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330

Thanks,

Jun


On Fri, May 9, 2014 at 4:06 AM, Sithiyavanich, Manawat (Agoda) <
Manawat.Sithiyavanich@agoda.com> wrote:

> Hi ,
>
> I have been setting up 2 clusters of Kafka located across DCs , one in
> Asia and another one in Europe. All of them are running on EC2 m1.xlarge
> machines.
> Our goal is to replicate data from Asia cluster to Europe cluster with
> "high and stable" speed.
>
> I ended up trying Mirrormaker provided by Kafka , setup in Europe cluster,
> and here is the weird problems I have got so far.
> Let N = number of partitions of a topic in Asia cluster.
>
> 1) Using the Kafka provided Mirrormaker , with -num.streams = N.
>
> It runs super slow at the rate of around 30 Mbps while we expect around 1
> Gbps as we have noticed this rate at Asia cluster when producer producing
> data to Kafka brokers.
>
> 2) I tried creating my own version of Consumer using High-level API in
> Europe to consume data directly from Asia cluster , and ended up with the
> same rate around 30 to 40 Mbps
>
> I have designed the consumer this way.
>
> - 1 Consumer Group (1 kafka.javaapi.consumer.ConsumerConnector)
> - N threads to handle N streams from 1 Consumer Group
>
> 3) I tried creating another version of Consumer using High-level API , and
> ended up with around 400 Mbps! , which is much more higher than 2 mentioned
> ways
>
> Here is my new design
>
> - N Consumer Group (N kafka.javaapi.consumer.ConsumerConnector) with the
> same group ID
> - N Threads to handle N Consumer Group
> - 1 Consumer Group handles 1 Stream
>
> Also , the more number of N  , the higher speeds I've got from the
> solution 3) while solution 1) and 2) remains the speed at 30Mbps.
>
> I don't see the technical differences between solution 3) and the rest but
> how the speed of consuming data with solution 3) becomes so hugely
> different from the rest?
>
> Right now our N = 60 , and each message in a topic in Asia cluster has
> average size of 50KB.
>
> Regards
> Manawat
>
> ________________________________
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message