mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: Matrix inversion
Date Thu, 05 May 2016 20:56:50 GMT
BTW, Thibaut, in the paper you mention, MPI based implementation beats
Spark at least 2 times on performance of the inversion. Kinda what i was
saying -- and in this case it doesn't seem that algorithm is as highly
interconnected as, e.g., naive blockwise multiplication.

On Thu, May 5, 2016 at 1:50 PM, Dmitriy Lyubimov <> wrote:

> The mantra i keep hearing is that if someone needs matrix inversion then
> he/she must be doing something wrong. Not sure how true that is, but in all
> cases i have encountered, people try to avoid matrix inversion one way or
> another.
> Re: libraries: Mahout is more about apis now than any particular in-core
> library. Unfortunately, mahout's in-memory operations are rooted in
> single-threaded colt and are pretty slow at the moment. We are looking for
> ways of doing in-memory operations faster and integrating something better
> and native.
> However, the really limiting factor seems to be Spark programming model
> and the effects it brings to interconnected I/O problems with high degree
> of scattering. Cf. , for example, to performances you can get with MKL MPI
> wrapper. If you are looking for performance of distributed algebra on CPUs,
> there's very few things that can compete with MKL MPI wrapper.
> My personal opinion is that for as long as the problem fits in memory (and
> most of them do nowadays), no algorithm on spark is going to beat Matlab in
> matrix multiplication and such, all things being equal, no matter how many
> cores spark cluster gets, on 1gbit networks. The same seems to be 10-fold
> true when comparing to GPU based algorithms (case in point: BidMach).
> On Thu, May 5, 2016 at 12:45 PM, thibaut <>
> wrote:
>> My askings are:
>> - Is it better for what we want to do to use Mahout, or Spark ?
> Mahout at this point is better for declarative prototyping as it contains
> distributed optimizer and compact expression dsl.
> - I saw that you already have a distributed PCA. Do you have a really
>> efficient matrix inversion algorithm in Mahout ?
> PCA underpinnings are described in detail in the "AM:Beyond MapReduce"
> book.
>> - How good is the linear algebra library in compare to Matlab for example
>> ?
> See my opinion above about algorithms on spark. Yes, i did some
> benchmarking and digging around. Some things could be on-par, but
> interconnected things are decidedly worse than single node Matlab (in terms
> of speed).
>> Finally, our main concern for using Spark is about the linear algebra
>> library that is used with Spark. And we were wondering how good is the
>> Mahout one ?
> What do you mean specifically? Speed? As i said, the in-core speed is what
> one can expect from java based implementation, but in-core speed factor
> seems to be far overshadowed by I/O programming model issues in highly
> interconnected problems once certain size of the problem is reached.
> Thanking you in advance,
>> Best regards.
>> Thibaut

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message