spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evan R. Sparks" <evan.spa...@gmail.com>
Subject Re: Is There Any Benchmarks Comparing C++ MPI with Spark
Date Thu, 19 Jun 2014 16:07:35 GMT
Larry,

I don't see any reference to Spark in particular there.

Additionally, the benchmark only scales up to datasets that are roughly
10gb (though I realize they've picked some fairly computationally intensive
tasks), and they don't present their results on more than 4 nodes. This can
hide things like, for example, a communication pattern that is O(n^2) in
the number of cluster nodes.

Obviously they've gotten some great performance out of SciDB, but I don't
think this answers the MPI vs. Spark question directly.

My own experience suggests that as long as your algorithm fits in a BSP
programming model, with Spark you can achieve performance that is
comparable to a tuned C++/MPI codebase by leveraging the right libraries
locally and thinking carefully about what and when you have to communicate.

- Evan


On Thu, Jun 19, 2014 at 8:48 AM, ldmtwo <larry.d.moore.ii@intel.com> wrote:

>
> Here is a partial comparison.
>
>
> http://dspace.mit.edu/bitstream/handle/1721.1/82517/MIT-CSAIL-TR-2013-028.pdf?sequence=2
>
> SciDB uses MPI with Intel HW and libraries. Amazing performance at the cost
> of more work.
>
> In case the link stops working:
> A Complex Analytics Genomics Benchmark Rebecca Taft-, Manasi Vartak-,
> Nadathur Rajagopalan Satish, Narayanan Sundaram, Samuel Madden, and Michael
> Stonebraker
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-There-Any-Benchmarks-Comparing-C-MPI-with-Spark-tp7661p7919.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message