spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evan R. Sparks" <>
Subject Re: Is There Any Benchmarks Comparing C++ MPI with Spark
Date Thu, 19 Jun 2014 16:07:35 GMT

I don't see any reference to Spark in particular there.

Additionally, the benchmark only scales up to datasets that are roughly
10gb (though I realize they've picked some fairly computationally intensive
tasks), and they don't present their results on more than 4 nodes. This can
hide things like, for example, a communication pattern that is O(n^2) in
the number of cluster nodes.

Obviously they've gotten some great performance out of SciDB, but I don't
think this answers the MPI vs. Spark question directly.

My own experience suggests that as long as your algorithm fits in a BSP
programming model, with Spark you can achieve performance that is
comparable to a tuned C++/MPI codebase by leveraging the right libraries
locally and thinking carefully about what and when you have to communicate.

- Evan

On Thu, Jun 19, 2014 at 8:48 AM, ldmtwo <> wrote:

> Here is a partial comparison.
> SciDB uses MPI with Intel HW and libraries. Amazing performance at the cost
> of more work.
> In case the link stops working:
> A Complex Analytics Genomics Benchmark Rebecca Taft-, Manasi Vartak-,
> Nadathur Rajagopalan Satish, Narayanan Sundaram, Samuel Madden, and Michael
> Stonebraker
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at

View raw message