BTW, Thibaut, in the paper you mention, MPI based implementation beats
Spark at least 2 times on performance of the inversion. Kinda what i was
saying  and in this case it doesn't seem that algorithm is as highly
interconnected as, e.g., naive blockwise multiplication.
On Thu, May 5, 2016 at 1:50 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> The mantra i keep hearing is that if someone needs matrix inversion then
> he/she must be doing something wrong. Not sure how true that is, but in all
> cases i have encountered, people try to avoid matrix inversion one way or
> another.
>
> Re: libraries: Mahout is more about apis now than any particular incore
> library. Unfortunately, mahout's inmemory operations are rooted in
> singlethreaded colt and are pretty slow at the moment. We are looking for
> ways of doing inmemory operations faster and integrating something better
> and native.
>
> However, the really limiting factor seems to be Spark programming model
> and the effects it brings to interconnected I/O problems with high degree
> of scattering. Cf. , for example, to performances you can get with MKL MPI
> wrapper. If you are looking for performance of distributed algebra on CPUs,
> there's very few things that can compete with MKL MPI wrapper.
>
> My personal opinion is that for as long as the problem fits in memory (and
> most of them do nowadays), no algorithm on spark is going to beat Matlab in
> matrix multiplication and such, all things being equal, no matter how many
> cores spark cluster gets, on 1gbit networks. The same seems to be 10fold
> true when comparing to GPU based algorithms (case in point: BidMach).
>
> On Thu, May 5, 2016 at 12:45 PM, thibaut <thibaut.gensollen@gmail.com>
> wrote:
>
>>
>> My askings are:
>>  Is it better for what we want to do to use Mahout, or Spark ?
>>
>
> Mahout at this point is better for declarative prototyping as it contains
> distributed optimizer and compact expression dsl.
>
>  I saw that you already have a distributed PCA. Do you have a really
>> efficient matrix inversion algorithm in Mahout ?
>>
> PCA underpinnings are described in detail in the "AM:Beyond MapReduce"
> book.
>
>>  How good is the linear algebra library in compare to Matlab for example
>> ?
>>
> See my opinion above about algorithms on spark. Yes, i did some
> benchmarking and digging around. Some things could be onpar, but
> interconnected things are decidedly worse than single node Matlab (in terms
> of speed).
>
>>
>> Finally, our main concern for using Spark is about the linear algebra
>> library that is used with Spark. And we were wondering how good is the
>> Mahout one ?
>
> What do you mean specifically? Speed? As i said, the incore speed is what
> one can expect from java based implementation, but incore speed factor
> seems to be far overshadowed by I/O programming model issues in highly
> interconnected problems once certain size of the problem is reached.
>
>>
>>
> Thanking you in advance,
>>
>> Best regards.
>> Thibaut
>
>
>
