kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Compton <d...@danielcompton.net>
Subject How does number of partitions affect sequential disk IO
Date Tue, 24 Jun 2014 07:58:37 GMT
I’ve been reading the Kafka docs and one thing that I’m having trouble understanding is
how partitions affect sequential disk IO. One of the reasons Kafka is so fast is that you
can do lots of sequential IO with read-ahead cache and all of that goodness. However, if your
broker is responsible for say 20 partitions, then won’t the disk be seeking to 20 different
spots for its writes and reads? I thought that maybe letting the OS handle fsync would make
this less of an issue but it still seems like it could be a problem.

In our particular situation, we are going to have 6 brokers, 3 in each DC, with mirror maker
replication from the secondary DC to the primary DC. We aren’t likely to need to add more
nodes for a while so would it be faster to have 1 partition/node than say 3-4/node to minimise
the seek times on disk?

Are my assumptions correct or is this not an issue in practice? There are some nice things
about having more partitions like rebalancing more evenly if we lose a broker but we don’t
want to make things significantly slower to get this.  

Thanks, Daniel.
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message