From user-return-2125-apmail-storm-user-archive=storm.apache.org@storm.incubator.apache.org Mon May 12 18:15:23 2014 Return-Path: X-Original-To: apmail-storm-user-archive@minotaur.apache.org Delivered-To: apmail-storm-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8315011E1C for ; Mon, 12 May 2014 18:15:23 +0000 (UTC) Received: (qmail 54802 invoked by uid 500); 12 May 2014 15:28:43 -0000 Delivered-To: apmail-storm-user-archive@storm.apache.org Received: (qmail 54761 invoked by uid 500); 12 May 2014 15:28:43 -0000 Mailing-List: contact user-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@storm.incubator.apache.org Delivered-To: mailing list user@storm.incubator.apache.org Received: (qmail 54753 invoked by uid 99); 12 May 2014 15:28:43 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 15:28:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ncleung@gmail.com designates 209.85.213.182 as permitted sender) Received: from [209.85.213.182] (HELO mail-ig0-f182.google.com) (209.85.213.182) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 15:28:39 +0000 Received: by mail-ig0-f182.google.com with SMTP id uy17so3971800igb.15 for ; Mon, 12 May 2014 08:28:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=58ebRU3FLwcPSBCz2qcjZKEFtBBlCyikW/t4IUZs1J8=; b=MFKziJNDWHpaGy+xyjS5r8wLNqhSd9zkO6j4E3ukTtn4C2TB6f71ZIEwUxWx+FChjR TisvpfexfdcwIwx7q/wVx3u+TLdjlQ0dmwxh/nbiVGsM8A+kFeOB5goZbC3jBNXw2Q8P 2JmDwC64uj+8iZ+2Eg+Op7vAllSpLevAlTHYff8JPdmQcTK6gy6sVfIsm2Rr0+1J5+pR 40TDYEisO08ggS+jBiu/+0l6V49wIY8IQzLX+rLjS5Cx7NlcFJPBwtU0bfGl/lfAVvmD gNX4YBNQEdkm72LVQsSpGJFhJsNhXUNKw3nCC5dPBTqwGL6qgAWnZdOm0xHebHZnJsLG v8zw== MIME-Version: 1.0 X-Received: by 10.50.109.202 with SMTP id hu10mr9135663igb.1.1399908496265; Mon, 12 May 2014 08:28:16 -0700 (PDT) Received: by 10.64.252.161 with HTTP; Mon, 12 May 2014 08:28:16 -0700 (PDT) In-Reply-To: References: Date: Mon, 12 May 2014 11:28:16 -0400 Message-ID: Subject: Re: Interesting Comparison From: Nathan Leung To: user Content-Type: multipart/alternative; boundary=089e013a1d86a3592e04f935984d X-Virus-Checked: Checked by ClamAV on apache.org --089e013a1d86a3592e04f935984d Content-Type: text/plain; charset=UTF-8 a couple thoughts 1) IBM streams is certainly more mature, as it's been in development for a longer amount of time and storm is not even at release 1.0 yet. Though I am not familiar with SPL, It would also make sense that it's faster to implement as it is a higher level abstraction. 2) Operator fusion will allow more efficiency in passing data between steps in your flow, as localOrShuffleGrouping will still need to go over disruptor whereas operator fusion from what I understand basically passes the pointer directly. As fast as disruptor is (I've seen benchmarks of millions of messages passed / s), it won't be directly passing data to the next step (cost: a few instructions). The downside of this is your flow always needs to be created and compiled before you can execute it. Something like a rebalance will require a recompile of your stream. Building a topology dynamically (which is possible in storm, but not a feature that is really exposed out of the box) is possible in storm, but not in IBM streams. 3) they took 1 month to optimize storm but I suspect some of this work was unnecessary. Python? For a benchmark? Also, uniform message distribution by size feels like a premature optimization. I can understand that they would want to explore all avenues to account for a performance difference, but in many (most?) practical cases this would not be necessary. I can sympathize on other points. Tuning the message buffers of storm requires pretty specific understanding of the system. Also if you run out of heap and/or have to tune GC, then... yeah. Not fun. This would be true for any java app though. 4) I'm not sure they really took language differences seriously enough. I've written certain algorithms in Java that (based on similar algorithms that I implemented separately in C++) I would suspect are close to an order of magnitude slower just because I ran them in Java. While I haven't dug into this deeply (for example by using an identical algorithm for both Java and C++), consider a HashMap indexed by a primitive type. In Java, these are separate objects stored in an array of references. In C++ these are stored sequentially in an array. C++ allows direct key access in the array (as opposed to going through the reference), and is also potentially much friendlier with the cache. Just because the JVM is healthy does not mean it's going to perform like C++ for all applications. I suppose you could then argue that for best performance Storm is more or less limited to the JVM, but I choose not to consider that point here for brevity. Note this is not to say that it's impossible to write fast code in Java (see previously mentioned disruptor). I would just argue that it's a good bit harder. 5) I'm not sure I buy their argument that application logic costs are unlikely to mask the differences in framework performance. This depends very heavily on your application. If you're hitting external data sources a lot (e.g. memcache or database) then that will certainly mask a good portion of the difference. Maybe part of this argument is a C++ vs Java difference, in which case I'm somewhat more inclined to agree. 6) From a business perspective, the question changes from "is it faster?" to "what does it cost to support the throughput that we need?" which is a very different question. In many cases storm performs well enough. On Mon, May 12, 2014 at 9:02 AM, John Welcher wrote: > Hi > > Streams also cost 40,000 US while Storm is free. > > John > > > On Mon, May 12, 2014 at 3:49 AM, Klausen Schaefersinho < > klaus.schaefers@gmail.com> wrote: > >> Hi, >> >> I found some interesting comparison of IBM Stream and Storm: >> >> https://www.ibmdw.net/streamsdev/2014/04/22/streams-apache-storm/ >> >> It also includes an interesting comparison between ZeroMQ and the Netty >> Performance. >> >> >> Cheers, >> >> Klaus >> > > --089e013a1d86a3592e04f935984d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
a couple thoughts

1) IBM streams is cer= tainly more mature, as it's been in development for a longer amount of = time and storm is not even at release 1.0 yet. =C2=A0Though I am not famili= ar with SPL, It would also make sense that it's faster to implement as = it is a higher level abstraction.

2) Operator fusion will allow more efficiency in passin= g data between steps in your flow, as localOrShuffleGrouping will still nee= d to go over disruptor whereas operator fusion from what I understand basic= ally passes the pointer directly. =C2=A0As fast as disruptor is (I've s= een benchmarks of millions of messages passed / s), it won't be directl= y passing data to the next step (cost: a few instructions). =C2=A0The downs= ide of this is your flow always needs to be created and compiled before you= can execute it. =C2=A0Something like a rebalance will require a recompile = of your stream. =C2=A0Building a topology dynamically (which is possible in= storm, but not a feature that is really exposed out of the box) is possibl= e in storm, but not in IBM streams.

3) they took 1 month to optimize storm but I suspect so= me of this work was unnecessary. =C2=A0Python? =C2=A0For a benchmark? =C2= =A0Also, uniform message distribution by size feels like a premature optimi= zation. =C2=A0I can understand that they would want to explore all avenues = to account for a performance difference, but in many (most?) practical case= s this would not be necessary. =C2=A0I can sympathize on other points. =C2= =A0Tuning the message buffers of storm requires pretty specific understandi= ng of the system. =C2=A0Also if you run out of heap and/or have to tune GC,= then... yeah. =C2=A0Not fun. =C2=A0This would be true for any java app tho= ugh.

4) I'm not sure they really took language differenc= es seriously enough. =C2=A0I've written certain algorithms in Java that= (based on similar algorithms that I implemented separately in C++) I would= suspect are close to an order of magnitude slower just because I ran them = in Java. =C2=A0While I haven't dug into this deeply (for example by usi= ng an identical algorithm for both Java and C++), consider a HashMap indexe= d by a primitive type. =C2=A0In Java, these are separate objects stored in = an array of references. =C2=A0In C++ these are stored sequentially in an ar= ray. =C2=A0C++ allows direct key access in the array (as opposed to going t= hrough the reference), and is also potentially much friendlier with the cac= he. =C2=A0Just because the JVM is healthy does not mean it's going to p= erform like C++ for all applications. =C2=A0I suppose you could then argue = that for best performance Storm is more or less limited to the JVM, but I c= hoose not to consider that point here for brevity. =C2=A0Note this is not t= o say that it's impossible to write fast code in Java (see previously m= entioned disruptor). =C2=A0I would just argue that it's a good bit hard= er.

5) I'm not sure I buy their argument that applicati= on logic costs are unlikely to mask the differences in framework performanc= e. =C2=A0This depends very heavily on your application. =C2=A0If you're= hitting external data sources a lot (e.g. memcache or database) then that = will certainly mask a good portion of the difference. =C2=A0Maybe part of t= his argument is a C++ vs Java difference, in which case I'm somewhat mo= re inclined to agree.

6) From a business perspective, the question changes fr= om "is it faster?" to "what does it cost to support the thro= ughput that we need?" which is a very different question. =C2=A0In man= y cases storm performs well enough.


On= Mon, May 12, 2014 at 9:02 AM, John Welcher <jpwelcher@gmail.com&g= t; wrote:
Hi

S= treams also cost 40,000 US while Storm is free.

John


On Mon, May 12, 2014 at 3:49 AM, Klausen Schaefersinho <klaus.= schaefers@gmail.com> wrote:
Hi,

I fo= und some interesting comparison of IBM Stream and Storm:


It also includes an interesting comparison betwee= n ZeroMQ and the Netty Performance.


=
Cheers,

Klaus


--089e013a1d86a3592e04f935984d--