kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörg Wagner <joerg.wagn...@1und1.de>
Subject Re: Amount of partitions
Date Mon, 07 Sep 2015 08:32:43 GMT
Sorry, the messages are keyed.

On 07.09.2015 10:08, Jörg Wagner wrote:
> Thank you very much for both replies.
> @Tao
> Thanks, I am aware of and have read that article. I am asking because 
> my experience is completely different :/. Everytime we go beyond 400 
> partitions the cluster really starts breaking apart.
> @Todd
> Thank you, very informative.
> Or Details:
> 3 Brokers: 192GB Ram, 27 Disks for log.dirs, 9 topics and estimated 
> 50k requests / second on 3 of the topics, the others are negligible.
> Ordering is not required, messages are not keyed
> The 3 main topics are one per DC (3 DCs) and being mirrored to the 
> others.
> The issue arises when we use over 400 partitions, which I think we 
> require due to performance and mirroring. Partitions get out of sync 
> and the log is being spammed by replicator messages. At the core, we 
> start having massive stability issues.
> Additionally, the mirrormaker only gets 2k Messages per *minute* 
> through with a stable setup of 81 partitions (for the 3 main topics).
> Has anyone experienced this and can give more insight? We have been 
> doing testing for weeks, compared configuration and setups, without 
> finding the main cause.
> Can this be a Kernel (version/configuration) or Java(7) issue?
> Cheers
> Jörg
> On 04.09.2015 20:24, Todd Palino wrote:
>> Jun's post is a good start, but I find it's easier to talk in terms 
>> of more
>> concrete reasons and guidance for having fewer or more partitions per 
>> topic.
>> Start with the number of brokers in the cluster. This is a good baseline
>> for the minimum number of partitions in a topic, as it will assure 
>> balance
>> over the cluster. Of course, if you have lots of topics, you can
>> potentially skip past this as you'll end up with balanced load in the
>> aggregate, but I think it's a good practice regardless. As with all 
>> other
>> advice here, there are always exceptions. If you really, really, really
>> need to assure ordering of messages, you might be stuck with a single
>> partition for some use cases.
>> In general, you should pick more partitions if a) the topic is very 
>> busy,
>> or b) you have more consumers. Looking at the second case first, you 
>> always
>> want to have at least as many partitions in a topic as you have 
>> individual
>> consumers in a consumer group. So if you have 16 consumers in a single
>> group, you will want the topic they consume to have at least 16 
>> partitions.
>> In fact, you may also want to always have a multiple of the number of
>> consumers so that you have even distribution. How many consumers you 
>> have
>> in a group is going to be driven more by what you do with the 
>> messages once
>> they are consumed, so here you'll be looking from the bottom of your 
>> stack
>> up, until you get to Kafka.
>> How busy the topic is is looking from the top down, through the 
>> producer,
>> to Kafka. It's also a little more difficult to provide guidance on. 
>> We have
>> a policy of expanding partitions for a topic whenever the size of the
>> partition on disk (full retention over 4 days) is larger than 50 GB. We
>> find that this gives us a few benefits. One is that it takes a 
>> reasonable
>> amount of time when we need to move a partition from one broker to 
>> another.
>> Another is that when we have partitions that are larger than this, 
>> the rate
>> tends to cause problems with consumers. For example, we see mirror maker
>> perform much better, and have less spiky lag problems, when we stay 
>> under
>> this limit. We're even considering revising the limit down a little, as
>> we've had some reports from other wildcard consumers that they've had
>> problems keeping up with topics that have partitions larger than 
>> about 30
>> GB.
>> The last thing to look at is whether or not you are producing keyed
>> messages to the topic. If you're working with unkeyed messages, there 
>> is no
>> problem. You can usually add partitions whenever you want to down the 
>> road
>> with little coordination with producers and consumers. If you are 
>> producing
>> keyed messages, there is a good chance you do not want to change the
>> distribution of keys to partitions at various points in the future 
>> when you
>> need to size up. This means that when you first create the topic, you
>> probably want to create it with enough partitions to deal with growth 
>> over
>> time, both on the produce and consume side, even if that is too many
>> partitions right now by other measures. For example, we have one 
>> client who
>> requested 720 partitions for a particular set of topics. The 
>> reasoning was
>> that they are producing keyed messages, they wanted to account for 
>> growth,
>> and they wanted even distribution of the partitions to consumers as they
>> grow. 720 happens to have a lot of factors, so it was a good number for
>> them to pick.
>> As a note, we have up to 5000 partitions per broker right now on current
>> hardware, and we're moving to new hardware (more disk, 256 GB of memory,
>> 10gig interfaces) where we're going to have up to 12,000. Our default
>> partition count for most clusters is 8, and we've got topics up to 512
>> partitions in some places just taking into account the produce rate 
>> alone
>> (not counting those 720-partition topics that aren't that busy). Many of
>> our brokers run with over 10k open file handles for regular files alone,
>> and over 50k open when you include network.
>> -Todd
>> On Fri, Sep 4, 2015 at 8:11 AM, tao xiao <xiaotao183@gmail.com> wrote:
>>> Here is a good doc to describe how to choose the right number of 
>>> partitions
>>> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/

>>> On Fri, Sep 4, 2015 at 10:08 PM, Jörg Wagner <joerg.wagner1@1und1.de>
>>> wrote:
>>>> Hello!
>>>> Regarding the recommended amount of partitions I am a bit confused.
>>>> Basically I got the impression that it's better to have lots of
>>> partitions
>>>> (see information from linkedin etc). On the other hand, a lot of
>>>> performance benchmarks floating around show only a few partitions are
>>> being
>>>> used.
>>>> Especially when considering the difference between hdd and ssds and 
>>>> also
>>>> the amount thereof, what is the way to go?
>>>> In my case, I seem to have the best stability and performance 
>>>> issues with
>>>> few partitions *per hdd*, and only one io thread per disk.
>>>> What are your experiences and recommendations?
>>>> Cheers
>>>> Jörg
>>> -- 
>>> Regards,
>>> Tao

Mit freundlichem Gruß

Jörg Wagner

Mobile & Services

1&1 Internet AG | Sapporobogen 6-8 | 80637 München | Germany
Phone: +49 89 14339 324
E-Mail: joerg.wagner1@1und1.de | Web: www.1und1.de

Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484

Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas Hofmann, Markus Huhn,
Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen, Christian Würst
Aufsichtsratsvorsitzender: Michael Scheeren

Member of United Internet

View raw message