spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <...@adatao.com>
Subject Fwd: MBrace: Cloud Computing with Monads
Date Thu, 24 Oct 2013 02:41:17 GMT
Yes, that's what I was trying to (briefly & imprecisely) distinguish with
the words "fully async DAG" referring to Dryad, and generalizing MR to BSP.
I should have referred to Dryad as "a general DAG with a rich composition
algebra that the user can directly manipulate".

Spark is more than just MapReduce, so the clarification is helpful; I've
flinched each time I use the shorthand "really fast MapReduce". The
practical point here is that it's actually how Spark has become so
successful---that map*() and reduce*() abstraction is well known by people
looking for a speedy way out of the "batch-oriented Hadoop MapReduce"
problem but still take advantage of that strong ecosystem.

--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Wed, Oct 23, 2013 at 7:06 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> Just to be clear, Spark actually *does* support general task graphs,
> similar to Dryad (though a bit simpler in that there's a notion of "stages"
> and a fixed set of connection patterns between them). However, MBrace goes
> a step beyond that, in that the graphs can be modified dynamically based on
> user code. It's also not clear what the granularity of task spawns in
> MBrace is -- can you spawn stuff that runs for 1 millisecond, or 1 second,
> or 1 hour? The choice there greatly affects system design.
>
> Matei
>
> On Oct 23, 2013, at 6:54 PM, Christopher Nguyen <ctn@adatao.com> wrote:
>
> > Re MBrace: very interesting work. I'm a bit surprised though that the
> paper
> > makes no mention of DryadLINQ (
> > http://research.microsoft.com/en-us/projects/dryadlinq/dryadlinq.pdf).
> >
> > Architecturally it's a lot easier to see an MBrace implementation
> > specialized to a MapReduce (or more generically, a BSP) computation, than
> > to have a Spark implement the fully async DAG model of an MBrace/Dryad
> > engine.
> >
> > More practically, as interesting as it might be as a side effort, I think
> > for the core Spark effort to attempt something like that would be "off
> > mission". Spark's success to date has been more due to beautiful
> > implementation of a known architecture, than beautiful new architecture.
> > Basically, Spark does MapReduce 10-100x faster than Hadoop, and more
> people
> > by now understand how to get MapReduce to solve their problems than any
> > other parallel model. Spark sits natively on HDFS so that makes adoption
> a
> > lot easier to swallow. So at present, for Spark to mature quickly along
> > that successful trajectory, the key problems to address are more
> practical
> > "user interface" or "productivity" things like manageability,
> > deployability, fault-tolerance improvements, multi-user access, a bigger
> > library of pre-packaged algorithms, etc.
> >
> > Whether MapReduce's own success is an accident of history or something
> more
> > fundamental is subject to interesting debate. I remember being constantly
> > amazed by the number of problems that when squinted at the right way
> > becomes an MR-soluble problem at Google (starting ironically with
> PageRank
> > itself). Yes, apparently sometimes it does pay to see many things as a
> nail
> > when you have invested in a powerful hammer.
> >
> > Along those lines, here are some interesting perspectives on the beauty
> of
> > Dryad/DryadLINQ, and at least one practical reason why it didn't succeed
> as
> > an implementation.
> >
> >   -
> >
> http://blogs.msdn.com/b/dryad/archive/2010/02/15/some-dryad-and-dryadlinq-history.aspx
> >   -
> >
> http://geekswithblogs.net/johnsPerfBlog/archive/2011/12/12/rip-dryadlinq-or-long-live-linq-to-hadoop.aspx
> >
> >
> >
> > --
> > Christopher T. Nguyen
> > Co-founder & CEO, Adatao <http://adatao.com>
> > linkedin.com/in/ctnguyen
> >
> >
> >
> > On Wed, Oct 23, 2013 at 2:33 PM, Alex Boisvert <alex.boisvert@gmail.com
> >wrote:
> >
> >> (Resending to @apache list instead of old google-group)
> >>
> >> A bit of a random question but I was wondering if there were efforts
> >> underway to generalize / expand the Spark API towards something that
> would
> >> be similar to the MBrace [1] model ... there's certainly an overlap
> between
> >> the features of the systems already ... so I guess I'm thinking about an
> >> API that's less centered around RDDs (as a collection) and more towards
> >> distributed dataflow that would feel more like composing
> Promises/Futures
> >> ... or even generalizing to support various sorts of container/context
> >> monads.
> >>
> >> [1] "MBrace: Cloud Computing with Monads"
> >> http://plosworkshop.org/2013/preprint/dzik.pdf
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message