storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Leung <>
Subject Re: Interesting Comparison
Date Mon, 12 May 2014 15:28:16 GMT
a couple thoughts

1) IBM streams is certainly more mature, as it's been in development for a
longer amount of time and storm is not even at release 1.0 yet.  Though I
am not familiar with SPL, It would also make sense that it's faster to
implement as it is a higher level abstraction.

2) Operator fusion will allow more efficiency in passing data between steps
in your flow, as localOrShuffleGrouping will still need to go over
disruptor whereas operator fusion from what I understand basically passes
the pointer directly.  As fast as disruptor is (I've seen benchmarks of
millions of messages passed / s), it won't be directly passing data to the
next step (cost: a few instructions).  The downside of this is your flow
always needs to be created and compiled before you can execute it.
 Something like a rebalance will require a recompile of your stream.
 Building a topology dynamically (which is possible in storm, but not a
feature that is really exposed out of the box) is possible in storm, but
not in IBM streams.

3) they took 1 month to optimize storm but I suspect some of this work was
unnecessary.  Python?  For a benchmark?  Also, uniform message distribution
by size feels like a premature optimization.  I can understand that they
would want to explore all avenues to account for a performance difference,
but in many (most?) practical cases this would not be necessary.  I can
sympathize on other points.  Tuning the message buffers of storm requires
pretty specific understanding of the system.  Also if you run out of heap
and/or have to tune GC, then... yeah.  Not fun.  This would be true for any
java app though.

4) I'm not sure they really took language differences seriously enough.
 I've written certain algorithms in Java that (based on similar algorithms
that I implemented separately in C++) I would suspect are close to an order
of magnitude slower just because I ran them in Java.  While I haven't dug
into this deeply (for example by using an identical algorithm for both Java
and C++), consider a HashMap indexed by a primitive type.  In Java, these
are separate objects stored in an array of references.  In C++ these are
stored sequentially in an array.  C++ allows direct key access in the array
(as opposed to going through the reference), and is also potentially much
friendlier with the cache.  Just because the JVM is healthy does not mean
it's going to perform like C++ for all applications.  I suppose you could
then argue that for best performance Storm is more or less limited to the
JVM, but I choose not to consider that point here for brevity.  Note this
is not to say that it's impossible to write fast code in Java (see
previously mentioned disruptor).  I would just argue that it's a good bit

5) I'm not sure I buy their argument that application logic costs are
unlikely to mask the differences in framework performance.  This depends
very heavily on your application.  If you're hitting external data sources
a lot (e.g. memcache or database) then that will certainly mask a good
portion of the difference.  Maybe part of this argument is a C++ vs Java
difference, in which case I'm somewhat more inclined to agree.

6) From a business perspective, the question changes from "is it faster?"
to "what does it cost to support the throughput that we need?" which is a
very different question.  In many cases storm performs well enough.

On Mon, May 12, 2014 at 9:02 AM, John Welcher <> wrote:

> Hi
> Streams also cost 40,000 US while Storm is free.
> John
> On Mon, May 12, 2014 at 3:49 AM, Klausen Schaefersinho <
>> wrote:
>> Hi,
>> I found some interesting comparison of IBM Stream and Storm:
>> It also includes an interesting comparison between ZeroMQ and the Netty
>> Performance.
>> Cheers,
>> Klaus

View raw message