mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: Matrix inversion
Date Thu, 05 May 2016 20:50:20 GMT
The mantra i keep hearing is that if someone needs matrix inversion then
he/she must be doing something wrong. Not sure how true that is, but in all
cases i have encountered, people try to avoid matrix inversion one way or

Re: libraries: Mahout is more about apis now than any particular in-core
library. Unfortunately, mahout's in-memory operations are rooted in
single-threaded colt and are pretty slow at the moment. We are looking for
ways of doing in-memory operations faster and integrating something better
and native.

However, the really limiting factor seems to be Spark programming model and
the effects it brings to interconnected I/O problems with high degree of
scattering. Cf. , for example, to performances you can get with MKL MPI
wrapper. If you are looking for performance of distributed algebra on CPUs,
there's very few things that can compete with MKL MPI wrapper.

My personal opinion is that for as long as the problem fits in memory (and
most of them do nowadays), no algorithm on spark is going to beat Matlab in
matrix multiplication and such, all things being equal, no matter how many
cores spark cluster gets, on 1gbit networks. The same seems to be 10-fold
true when comparing to GPU based algorithms (case in point: BidMach).

On Thu, May 5, 2016 at 12:45 PM, thibaut <>

> My askings are:
> - Is it better for what we want to do to use Mahout, or Spark ?

Mahout at this point is better for declarative prototyping as it contains
distributed optimizer and compact expression dsl.

- I saw that you already have a distributed PCA. Do you have a really
> efficient matrix inversion algorithm in Mahout ?
PCA underpinnings are described in detail in the "AM:Beyond MapReduce"

> - How good is the linear algebra library in compare to Matlab for example ?
See my opinion above about algorithms on spark. Yes, i did some
benchmarking and digging around. Some things could be on-par, but
interconnected things are decidedly worse than single node Matlab (in terms
of speed).

> Finally, our main concern for using Spark is about the linear algebra
> library that is used with Spark. And we were wondering how good is the
> Mahout one ?

What do you mean specifically? Speed? As i said, the in-core speed is what
one can expect from java based implementation, but in-core speed factor
seems to be far overshadowed by I/O programming model issues in highly
interconnected problems once certain size of the problem is reached.

Thanking you in advance,
> Best regards.
> Thibaut

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message