storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Re'em Bensimhon" <r...@forter.com>
Subject Python Shell Bolt Performance Isses
Date Sun, 03 Jan 2016 09:08:30 GMT
Hello everybody

We at Forter have been advocates of Storm for a while now, we use it in
production for a low latency decision-as-a-service engine. We use Python
multilang bolts a lot, and have noticed some performance issues with
passing large objects into them. Basically what I'm seeing in high load
envs is that it takes ~50-60ms to pass a 512kb tuples into shell-bolts (1
way). We're very latency driven here at Forter and we're at a point where
we need to do better than that. I played around with the code and did a few
benchmarks. I would like to share my experience and conclusions here: maybe
help someone looking at the same issue in the future or get some inputs and
ideas from your experience. I made the benchmark code available at
https://github.com/forter/storm-python-shellbolt-benchmark.

Granted, 512kb is big, but still: 50ms is a rate of only ~10mb/s. And
that's roughly the rate for smaller payloads too. That's extremely low
considering that piping to another process can easily handle data rates of
1GB/s+ in a true streaming protocol.

I wrote a Java app that simulates a storm bolt and transfers data into the
bolt at max rate over stdout. All tests were conducted on my machine,
consider them ballpark estimates.

It seems like the max rate I could squeeze out of the default Python bolt
implementation is ~55mb/s with a ~512k tuple input & output (while output
is directed to /dev/null, the object involved was always flat).

Reading huge buffers into a Python process with no line separations or
parsing, I can get a python process to read 2G/s (all tests were done with
the Java app, so same write/flush pattern on the writing end). When reading
standard input line by line in Python (just reading, still no parsing) I
get a throughput of 350mb/s. That seems like a big drop just for line
separation considering that every second line written is 512kb. And yet,
this is 6x faster than with the bolt logic included. Since
serialization&deserialization are mainly what the bolt does, it's clear the
serialization is bottlenecking the python process and preventing I/O from
being processed.

I tried the Node.js bolt implementation too, but got the same results
(55mb/s). That was surprising, I expected Node should at least be more
efficient at parsing JSONs than python. BTW, removing the JSONification at
the ends (emits) of both bolts and retrying showed NodeJS increase to
80mb/s and Python to 115MB/s.

Then I benchmarked a Msgpack implementation. I managed to get 400mb/s for
the same data with no output serialization (lack of time). ~4x faster than
python without output serialization. Since the size passed in bytes was in
the ballpark of the JSON representation the increase in performance can
only be attributed to msgpack efficiency over JSON.

My conclusion from all this is that programming languages that are
effectively single threaded are a bad choice for this kind of interprocess
communications. If Python/Node had a way of utilizing more CPU cores I
could have used it to decrease the latency by parallelizing the
read-deserialize/logic/serialize-write steps. That would serve the
hight-throughput use case very well, but if latency is what we're concerned
with this wouldn't be a huge improvement either. Serialization ops are the
bottleneck and our only hope is to make them more efficient.

Did anybody else encounter the same issue with multilang bolts? Has anyone
got a creative solution for Python bolts that get heavy inputs?

Here's how to use the benchmark tool:

Passthrough python, reads by line without any logic:

java -jar target/shell-bolt-benchmark-1.0-SNAPSHOT.jar json | pv | python
> passthrough.py > /dev/null
>

Default Storm implementation with json parsing and all


> java -jar target/shell-bolt-benchmark-1.0-SNAPSHOT.jar json | pv | python
> storm.py > /dev/null
>

Msgpack implementation, no output:

java -jar target/shell-bolt-benchmark-1.0-SNAPSHOT.jar msgpack | pv |
> python storm_msgpack.py > /dev/null



- Re'em Bensimhon

Mime
View raw message