spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Ogren <philip.og...@oracle.com>
Subject Re: Opinions stratosphere
Date Fri, 02 May 2014 16:39:30 GMT
Great reference!  I just skimmed through the results without reading 
much of the methodology - but it looks like Spark outperforms 
Stratosphere fairly consistently in the experiments.  It's too bad the 
data sources only range from 2GB to 8GB.  Who knows if the apparent 
pattern would extend out to 64GB, 128GB, 1TB, and so on...



On 05/01/2014 06:02 PM, Christopher Nguyen wrote:
> Someone (Ze Ni, https://www.sics.se/people/ze-ni) has actually 
> attempted such a comparative study as a Masters thesis:
>
> http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf
>
> According to this snapshot (c. 2013), Stratosphere is different from 
> Spark in not having an explicit concept of an in-memory dataset (e.g., 
> RDD).
>
> In principle this could be argued to be an implementation detail; the 
> operators and execution plan/data flow are of primary concern in the 
> API, and the data representation/materializations are otherwise 
> unspecified.
>
> But in practice, for long-running interactive applications, I consider 
> RDDs to be of fundamental, first-class citizen importance, and the key 
> distinguishing feature of Spark's model vs other "in-memory" 
> approaches that treat memory merely as an implicit cache.
>
> --
> Christopher T. Nguyen
> Co-founder & CEO, Adatao <http://adatao.com>
> linkedin.com/in/ctnguyen <http://linkedin.com/in/ctnguyen>
>
>
>
> On Tue, Nov 26, 2013 at 1:26 PM, Matei Zaharia 
> <matei.zaharia@gmail.com <mailto:matei.zaharia@gmail.com>> wrote:
>
>     I don’t know a lot about it except from the research side, where
>     the team has done interesting optimization stuff for these types
>     of applications. In terms of the engine, one thing I’m not sure of
>     is whether Stratosphere allows explicit caching of datasets
>     (similar to RDD.cache()) and interactive queries (similar to
>     spark-shell). But it’s definitely an interesting project to watch.
>
>     Matei
>
>     On Nov 22, 2013, at 4:17 PM, Ankur Chauhan
>     <achauhan@brightcove.com <mailto:achauhan@brightcove.com>> wrote:
>
>     > Hi,
>     >
>     > That's what I thought but as per the slides on
>     http://www.stratosphere.eu they seem to "know" about spark and the
>     scala api does look similar.
>     > I found the PACT model interesting. Would like to know if matei
>     or other core comitters have something to weight in on.
>     >
>     > -- Ankur
>     > On 22 Nov 2013, at 16:05, Patrick Wendell <pwendell@gmail.com
>     <mailto:pwendell@gmail.com>> wrote:
>     >
>     >> I've never seen that project before, would be interesting to get a
>     >> comparison. Seems to offer a much lower level API. For instance
>     this
>     >> is a wordcount program:
>     >>
>     >>
>     https://github.com/stratosphere/stratosphere/blob/master/pact/pact-examples/src/main/java/eu/stratosphere/pact/example/wordcount/WordCount.java
>     >>
>     >> On Thu, Nov 21, 2013 at 3:15 PM, Ankur Chauhan
>     <achauhan@brightcove.com <mailto:achauhan@brightcove.com>> wrote:
>     >>> Hi,
>     >>>
>     >>> I was just curious about
>     https://github.com/stratosphere/stratosphere
>     >>> and how does spark compare to it. Anyone has any experience
>     with it to make
>     >>> any comments?
>     >>>
>     >>> -- Ankur
>     >
>
>


Mime
View raw message