From Elben Shira <elbensh...@gmail.com>
Subject ec2 performance
Date Fri, 09 Mar 2012 18:32:38 GMT
Hey guys,

We're trying to deploy kafka 0.7 on EC2. According to a thread [1], he was
getting 20,000 messages/sec on both EBS and local disk, at a message size
of 1000. We have message sizes of 2K-6K, at a rate of 5,000 messages/sec
and growing. So we ran some tests to see how kafka can handle this. My
setup is a m1.large server running zookeeper and kafka server. Another
m1.large server doing the perf tests.

For the producer test, I ran:

bin/kafka-producer-perf-test.sh --async --batch-size 200 --brokerinfo
zk.connect=[REDACTED] --compression-codec 0 --message-size 3000 --messages
5000000 --topic elben-perf-test-2 --vary-message-size

And the results: https://gist.github.com/dc5e9cce497807d578d9

There are some weird results like this line:

INFO thread 8: 495000 messages sent 14124.2938 nMsg/sec 19.9273 MBs/sec
INFO thread 8: 500000 messages sent 21459.2275 nMsg/sec 30.8321 MBs/sec

Any ideas what's happening here? Are the perf tests miscalculating the
running average? But I think a correct conclusion is it produced 7496644565
bytes in 369 seconds, or roughly 20 MB/s.

Running the producer with --compression-codec 1 (gzip), I get:

bin/producer-perf-test.sh --async --batch-size 200 --brokerinfo zk.connect=
kafka1.i.massrel.com --compression-codec 2 --message-size 3000 --messages
1000000 --topic elben-perf-test-3 --vary-message-size
[0] 0:bash*

INFO Total Num Messages: 1000000 bytes: 1500536347 in 126.447 secs
INFO Messages/sec: 7908.4518 (kafka.tools.ProducerPerformance$)
INFO MB/sec: 11.3172 (kafka.tools.ProducerPerformance$)

For the consumer test, I ran:

bin/kafka-consumer-perf-test.sh --props config/consumer.properties --topic
elben-perf-test-2 --threads 10

With these results: https://gist.github.com/654093bd70571d21fb34

Again, there are weird things like why are the other threads consuming 0
MB/s and only thread 7 is doing 6.9 MB/s? Anyone else getting similar
results? We need to consume at least 10 MB/s—I suppose it would be best to
use partitions and use multiple consumers if we're seeing only 6 MB/s on a
dump consumer with 10 threads each.

Any suggestions or ideas? I've had lots of fun with Kafka and hope to be
able to use it!



