a couple thoughts
1) IBM streams is certainly more mature, as it's been in development for a longer amount of time and storm is not even at release 1.0 yet. Though I am not familiar with SPL, It would also make sense that it's faster to implement as it is a higher level abstraction.
2) Operator fusion will allow more efficiency in passing data between steps in your flow, as localOrShuffleGrouping will still need to go over disruptor whereas operator fusion from what I understand basically passes the pointer directly. As fast as disruptor is (I've seen benchmarks of millions of messages passed / s), it won't be directly passing data to the next step (cost: a few instructions). The downside of this is your flow always needs to be created and compiled before you can execute it. Something like a rebalance will require a recompile of your stream. Building a topology dynamically (which is possible in storm, but not a feature that is really exposed out of the box) is possible in storm, but not in IBM streams.
3) they took 1 month to optimize storm but I suspect some of this work was unnecessary. Python? For a benchmark? Also, uniform message distribution by size feels like a premature optimization. I can understand that they would want to explore all avenues to account for a performance difference, but in many (most?) practical cases this would not be necessary. I can sympathize on other points. Tuning the message buffers of storm requires pretty specific understanding of the system. Also if you run out of heap and/or have to tune GC, then... yeah. Not fun. This would be true for any java app though.
4) I'm not sure they really took language differences seriously enough. I've written certain algorithms in Java that (based on similar algorithms that I implemented separately in C++) I would suspect are close to an order of magnitude slower just because I ran them in Java. While I haven't dug into this deeply (for example by using an identical algorithm for both Java and C++), consider a HashMap indexed by a primitive type. In Java, these are separate objects stored in an array of references. In C++ these are stored sequentially in an array. C++ allows direct key access in the array (as opposed to going through the reference), and is also potentially much friendlier with the cache. Just because the JVM is healthy does not mean it's going to perform like C++ for all applications. I suppose you could then argue that for best performance Storm is more or less limited to the JVM, but I choose not to consider that point here for brevity. Note this is not to say that it's impossible to write fast code in Java (see previously mentioned disruptor). I would just argue that it's a good bit harder.
5) I'm not sure I buy their argument that application logic costs are unlikely to mask the differences in framework performance. This depends very heavily on your application. If you're hitting external data sources a lot (e.g. memcache or database) then that will certainly mask a good portion of the difference. Maybe part of this argument is a C++ vs Java difference, in which case I'm somewhat more inclined to agree.
6) From a business perspective, the question changes from "is it faster?" to "what does it cost to support the throughput that we need?" which is a very different question. In many cases storm performs well enough.