Hi Janardhan,
How will GPU implementation help scale distributed SVD:
Imran implemented an algorithm he found out about in the paper "A
Distributed and Incremental SVD Algorithm for Agglomerative Data Analysis
on Large Networks" (
https://github.com/apache/systemml/pull/273/files#diff488f06e290f7a54db2e125f7bc608971R27
).
The idea there was to build up a distributed SVD using invocations of svd
on your local machine. He tried to achieve the multilevel parallelism
through the parfor construct.
Each local invocation of svd was done using the Apache Commons Math library.
If each invocation of this local svd can instead be done on a GPU, the
overall wall time for the distributed version would be decreased.
Users may not always have a GPU. In that case, the svd falls back to the
Apache Comons Math implementation. But if they do and if we have a "svd"
builtin function, then it would be easier to take advantage of the GPU.
Problem with scalable svd in dml is due to spark backed issues, otherwise
there is not problem scaling w/o a local svd():
There maybe spark backend issues and more may come to light and more
workloads are executed on SystemML.
For any given operation  we can implement it as a DML bodied function or a
builtin function.
For DML Bodied functions:
Pros:
 The SystemML optimizer can be applied to it
 Distribution of SVD is then taken care of by SystemML. It will generate
and run the spark primitives needed.
Cons:
 Implementing SVD, whether in DML or C, is a fair amount of work
 There would not be a straightforward call to the svd gpu library. In
fact, each of the linear algebra primitives would be accelerated on the
GPU, but not the entire operation itself. This would involve many more JNI
calls.
For builtin functions:
Pros:
 Use of GPU libraries (cuSolver) and CPU libraries (Apache Commons Math)
can be made, these are already optimized (in case of the GPU)
 If a better SVD implementation is available via a library, that can
easily be plugged in.
Cons:
 Would have to come up with an algorithm to implement distributed SVD with
spark primitives
Pick your battle.
Maybe we could try another algorithm for scalable svd() :
Sure. But before you do that, it may be worth our while to understand what
is exactly misleading about the paper that Imran talks about.
Nakul
On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi <
janardhan.pulivarthi@gmail.com> wrote:
> Hi Nakul,
>
> Can you help me understand how gpu implementations help scale it. Whether
> the user always have GPUs in use when using this fn or is it an optional
> feature.
> The problem with implementing the scalable svd() in dml is due to the spark
> backend issues, otherwise there is no problem scaling w/o a local svd()
> function.
>
> May be we could try another algorithm for the scalable svd( ), if the
> present algorithm is misleading as Imran Younus pointed out.
>
> Thank you,
> Janardhan
>
