kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Bukowinski <pmb...@gmail.com>
Subject Re: Kafka Mirror Maker place of execution
Date Tue, 12 Mar 2019 17:24:02 GMT
Hi Franz,

The MirrorMaker instances are colocated with the brokers, yes. These are beefy, dedicated
hosts that are handling the loads admirably.

The core cluster receives about 400k msg/sec, 1GB/sec across 20 topics at peak times. CPU
usage occasionally crosses 50% during peak times. If I find that the hardware is getting overloaded,
or that our consumer load on the core cluster increases significantly, I will move MM to a
separate set of hosts. Right now, it’s quite cost-effective as is. :)

—
Peter


> On Mar 12, 2019, at 10:02 AM, Franz van Betteraey <fvbetteraey@web.de> wrote:
> 
> Hi Peter,
> 
> these are remarkable numbers but to be honest I do not get where you run the Mirror Maker
processes. 
> Do you run them near the remote clusters or near the target (core?) datacenter cluster?
> 
> As I understand you run 30 MirrorMaker Instances (one for each remote cluster) on each
of the 100 Kafka Nodes of your core datacenter cluster.
> So you run the Mirror Maker on the same machine as the Kafka Nodes and do not use a dedicated
machines for the Mirror Maker process?
> 
> 
> Best regards,
>  Franz
>  
> 
> Gesendet: Dienstag, 12. März 2019 um 16:24 Uhr
> Von: "Peter Bukowinski" <pmbuko@gmail.com <mailto:pmbuko@gmail.com>>
> An: users@kafka.apache.org <mailto:users@kafka.apache.org>
> Betreff: Re: Kafka Mirror Maker place of execution
> I have a setup with about 30 remote kafka clusters and one cluster in a core datacenter
where I aggregate data from all the remote clusters. The remote clusters have 30 nodes each
with moderate specs. The core cluster has 100 nodes with lots of cpu, ram, and ssd storage
per node.
> 
> I run MirrorMaker directly on the core brokers. Each broker runs one MirrorMaker instance
per edge cluster, sharing the same group.id. Since I’m running 100 instances per edge cluster,
the number of threads I use = (total partition count of topics I am mirroring) / 100. In practice,
each MM instance runs with about 25 threads, so each broker runs 25*30=750 threads of MirrorMaker.
> 
> I’ve been running this setup for many months and it’s proved to be stable with very
low consumer lag.
> 
> --
> Peter Bukowinski
> 
>> On Mar 12, 2019, at 6:42 AM, Ryanne Dolan <ryannedolan@gmail.com> wrote:
>> 
>> Franz, you can run MM on or near either source or target cluster, but it's
>> more efficient near the target because this minimizes producer latency. If
>> latency is high, poducers will block waiting on ACKs for in-flight records,
>> which reduces throughput.
>> 
>> I recommend running MM near the target cluster but not necessarily on the
>> same machines, because often Kafka nodes are relatively expensive, with SSD
>> arrays and huge IO bandwidth etc, which isn't necessary for MM.
>> 
>> Ryanne
>> 
>> On Tue, Mar 12, 2019, 8:13 AM Franz van Betteraey <fvbetteraey@web.de>
>> wrote:
>> 
>>> Hi all,
>>> 
>>> there are best practices out there which recommend to run the Mirror Maker
>>> on the target cluster.
>>> 
>>> https://community.hortonworks.com/articles/79891/kafka-mirror-maker-best-practices.html
>>> 
>>> I wonder why this recommendation exists because ultimately all data must
>>> cross the border between the clusters, regardless of whether they are
>>> consumed at the target or produced at the source. A reason I can imagine is
>>> that the Mirror Maker supports multimple consumer but only one producer -
>>> so consuming data on the way with the greater latency might be speed up by
>>> the use of multiple consumers.
>>> 
>>> If performance because of multi threading is a point, would it be usefaul
>>> to use several producer (one per consumer) to replicate the data (with a
>>> custom replication process)? Does anyone knows why the Mirror Maker shares
>>> a single producer among all consumers?
>>> 
>>> My usecase is the replication of data from several source cluster (~10) to
>>> a single target cluster. I would prefer to run the replication process on
>>> the source cluster to avoid to many replication processes (each for one
>>> source) on the target cluster.
>>> 
>>> Hints and suggestions on this topic are very welcome.
>>> 
>>> Best regards
>>> Franz
>>> 
>>> If you would like to earn some SO recommendation points feel free to
>>> answer this question on SO ;-)
>>> https://stackoverflow.com/q/55122268/367285[https://stackoverflow.com/q/55122268/367285
<https://stackoverflow.com/q/55122268/367285[https://stackoverflow.com/q/55122268/367285>]


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message