systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Janardhan Pulivarthi <janardhan.pulivar...@gmail.com>
Subject Re: svd( ) implementation
Date Tue, 25 Jul 2017 18:14:40 GMT
Hi Nakul,

Thanks for explaining me about pros and cons of the two approaches. For
now, I have gone through the paper carefully over a couple of days and
found the following interesting things.

1. This is the algorithm we implemented.

​
2. In the above algorithm the input matrix A is approximated to another
matrix B with the following relation with the error of chi(p, i) [ as shown
in (3) ] which the author argues will be in an acceptable limit. So, we can
go with this algorithm.


But, one bad thing is that author is not sure about whether the algorithm
supports the sparse matrices or not. So, we may need to test it here.

For the time being we need to test the present algorithm implemented by
Imran Younus again. Can you help me with the testing?

Thanks,
Janardhan
​

On Fri, Jul 21, 2017 at 6:07 AM, Nakul Jindal <nakul02@gmail.com> wrote:

> Hi Janardhan,
>
>
>
>
> How will GPU implementation help scale distributed SVD:
>
> Imran implemented an algorithm he found out about in the paper "A
> Distributed and Incremental SVD Algorithm for Agglomerative Data Analysis
> on Large Networks" (
> https://github.com/apache/systemml/pull/273/files#diff-
> 488f06e290f7a54db2e125f7bc608971R27
> ).
> The idea there was to build up a distributed SVD using invocations of svd
> on your local machine. He tried to achieve the multi-level parallelism
> through the parfor construct.
> Each local invocation of svd was done using the Apache Commons Math
> library.
> If each invocation of this local svd can instead be done on a GPU, the
> overall wall time for the distributed version would be decreased.
>
> Users may not always have a GPU. In that case, the svd falls back to the
> Apache Comons Math implementation. But if they do and if we have a "svd"
> builtin function, then it would be easier to take advantage of the GPU.
>
>
>
>
>
>
> Problem with scalable svd in dml is due to spark backed issues, otherwise
> there is not problem scaling w/o a local svd():
>
> There maybe spark backend issues and more may come to light and more
> workloads are executed on SystemML.
> For any given operation - we can implement it as a DML bodied function or a
> builtin function.
> For DML Bodied functions:
> Pros:
> - The SystemML optimizer can be applied to it
> - Distribution of SVD is then taken care of by SystemML. It will generate
> and run the spark primitives needed.
>
> Cons:
> - Implementing SVD, whether in DML or C, is a fair amount of work
> - There would not be a straightforward call to the svd gpu library. In
> fact, each of the linear algebra primitives would be accelerated on the
> GPU, but not the entire operation itself. This would involve many more JNI
> calls.
>
> For builtin functions:
> Pros:
> - Use of GPU libraries (cuSolver) and CPU libraries (Apache Commons Math)
> can be made, these are already optimized (in case of the GPU)
> - If a better SVD implementation is available via a library, that can
> easily be plugged in.
>
> Cons:
> - Would have to come up with an algorithm to implement distributed SVD with
> spark primitives
>
> Pick your battle.
>
>
>
>
>
>
> Maybe we could try another algorithm for scalable svd() :
>
> Sure. But before you do that, it may be worth our while to understand what
> is exactly misleading about the paper that Imran talks about.
>
>
>
>
>
> -Nakul
>
>
>
>
>
>
>
> On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi <
> janardhan.pulivarthi@gmail.com> wrote:
>
> > Hi Nakul,
> >
> > Can you help me understand how gpu implementations help scale it. Whether
> > the user always have GPUs in use when using this fn or is it an optional
> > feature.
>
>
> > The problem with implementing the scalable svd() in dml is due to the
> spark
> > backend issues, otherwise there is no problem scaling w/o a local svd()
> > function.
> >
> > May be we could try another algorithm for the scalable svd( ), if the
> > present algorithm is misleading as Imran Younus pointed out.
> >
> > Thank you,
> > Janardhan
> >
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message