Janardhan,
The papers you're referring may not be relevant. The first paper, as far as
I can tell, is about updating an existing svd decomposition as new data
comes in. The 3rd paper in this list is the one I used, but that method is
not good.
There is also a method that uses QR decomposition and then calculates SVD
from R matrix. Please have a look at equation 1.3 in this paper:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.115&rank=1
I think this is worth trying out. The distributed QR is already implemented
in SystemlML, so it may quick to try out.
imran
On Fri, Jul 28, 2017 at 10:10 AM, Janardhan Pulivarthi <
janardhan.pulivarthi@gmail.com> wrote:
> Hi Nakul & all the committers,
>
> Till now I am half way through the literature. But, for now a couple of
> things to mention, in SVD there are three stages
> 1. Bidiagonal reduction step
> 2. Computation of the singular values
> 3. Computation of the singular vectors
>
> of these three, The* Bidiagonal reduction* step is very expensive, so is
> our focus on this( when considering GPU, at times where handling with CPU
> is infeasible).
>
> About literature:
>
> - I took some time to go through " A Stable and Fast Algorithm for
> Updating the Singular Value Decomposition" by "Gu & Stanley", to
> understand
> the numerical stability and round-off errors when we are partitioning
> the
> matrix in this distributed algorithm. The author has assured that each
> component computed will be of high absolute accuracy. And also, the
> properties that the resultant matrix support do not have any conflicts
> with
> parent matrix. [pdf
> ]
>
>
> - "High performance bidiagonal reduction using the tile algorithms on
> homogeneous multicore clusters ", by "Ltaief et. al", this paper has
> focused on the first stage mainly and has discussed a good about tile
> algorithms and their runtime implementations.(although off-topic, I read
> this just to understand.) [pdf
> ]
>
>
> - "A distributed and incremental svd algorithm for agglomerative data
> analysis on large networks", by "Iwen & Ong", *Please go through* the
> (a). TABLE 1, TABLE 2 . (b). APPENDIX A. RAW DATA FROM NUMERICAL
> EXPERIMENTS. [pdf ]
>
> Thanks,
>
> Janardhan
>
> On Wed, Jul 26, 2017 at 12:29 AM, Nakul Jindal wrote:
>
> > Hi Janardhan,
> >
> > The images you've used as attachments haven't reached my inbox.
> > Could you please send them to me directly, rather than through the dev
> > mailing list.
> > (Or upload it to a image hosting site like imgur and paste the links in
> the
> > email)
> >
> > I would like to point out that my knowledge of machine learning is
> limited.
> > Still, how would you want to test the algorithm?
> >
> >
> > Sparse matrices in SystemML (in Spark Execution Mode)
> > Sparse matrix support in SystemML is implemented at a block level. Each
> > (potentially giant) matrix is stored as blocks (in Spark execution mode).
> > The matrix itself doesn't store knowledge of whether it is sparse or not.
> > Each block does. Each block of this giant matrix can be stored in dense
> or
> > spare format, depending on how many non-zeroes are in that block.
> > The sparsity of a block is recomputed every so often, and the block
> > representation can be converted from dense to sparse and vice-versa.
> > When operations are performed on blocks, internally, a sparse aware
> > operation maybe performed, depending on how the blocks are stored.
> >
> > (One of the other contributors can clarify, if I've missed something or
> > have said something wrong).
> >
> > Given this context, can you please think about whether missing sparse
> > matrix support is still a problem?
> >
> >
> > -Nakul
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Jul 25, 2017 at 11:14 AM, Janardhan Pulivarthi <
> > janardhan.pulivarthi@gmail.com> wrote:
> >
> > > Hi Nakul,
> > >
> > > Thanks for explaining me about pros and cons of the two approaches. For
> > > now, I have gone through the paper carefully over a couple of days and
> > > found the following interesting things.
> > >
> > > 1. This is the algorithm we implemented.
> > >
> > >
> > > 2. In the above algorithm the input matrix A is approximated to another
> > > matrix B with the following relation with the error of chi(p, i) [ as
> > shown
> > > in (3) ] which the author argues will be in an acceptable limit. So, we
> > can
> > > go with this algorithm.
> > >
> > >
> > > But, one bad thing is that author is not sure about whether the
> algorithm
> > > supports the sparse matrices or not. So, we may need to test it here.
> > >
> > > For the time being we need to test the present algorithm implemented by
> > > Imran Younus again. Can you help me with the testing?
> > >
> > > Thanks,
> > > Janardhan
> > >
> > >
> > > On Fri, Jul 21, 2017 at 6:07 AM, Nakul Jindal
> wrote:
> > >
> > >> Hi Janardhan,
> > >>
> > >>
> > >>
> > >>
> > >> How will GPU implementation help scale distributed SVD:
> > >>
> > >> Imran implemented an algorithm he found out about in the paper "A
> > >> Distributed and Incremental SVD Algorithm for Agglomerative Data
> > Analysis
> > >> on Large Networks" (
> > >> https://github.com/apache/systemml/pull/273/files#diff-488f0
> > >> 6e290f7a54db2e125f7bc608971R27
> > >> ).
> > >> The idea there was to build up a distributed SVD using invocations of
> > svd
> > >> on your local machine. He tried to achieve the multi-level parallelism
> > >> through the parfor construct.
> > >> Each local invocation of svd was done using the Apache Commons Math
> > >> library.
> > >> If each invocation of this local svd can instead be done on a GPU, the
> > >> overall wall time for the distributed version would be decreased.
> > >>
> > >> Users may not always have a GPU. In that case, the svd falls back to
> the
> > >> Apache Comons Math implementation. But if they do and if we have a
> "svd"
> > >> builtin function, then it would be easier to take advantage of the
> GPU.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Problem with scalable svd in dml is due to spark backed issues,
> > otherwise
> > >> there is not problem scaling w/o a local svd():
> > >>
> > >> There maybe spark backend issues and more may come to light and more
> > >> workloads are executed on SystemML.
> > >> For any given operation - we can implement it as a DML bodied function
> > or
> > >> a
> > >> builtin function.
> > >> For DML Bodied functions:
> > >> Pros:
> > >> - The SystemML optimizer can be applied to it
> > >> - Distribution of SVD is then taken care of by SystemML. It will
> > generate
> > >> and run the spark primitives needed.
> > >>
> > >> Cons:
> > >> - Implementing SVD, whether in DML or C, is a fair amount of work
> > >> - There would not be a straightforward call to the svd gpu library. In
> > >> fact, each of the linear algebra primitives would be accelerated on
> the
> > >> GPU, but not the entire operation itself. This would involve many more
> > JNI
> > >> calls.
> > >>
> > >> For builtin functions:
> > >> Pros:
> > >> - Use of GPU libraries (cuSolver) and CPU libraries (Apache Commons
> > Math)
> > >> can be made, these are already optimized (in case of the GPU)
> > >> - If a better SVD implementation is available via a library, that can
> > >> easily be plugged in.
> > >>
> > >> Cons:
> > >> - Would have to come up with an algorithm to implement distributed SVD
> > >> with
> > >> spark primitives
> > >>
> > >> Pick your battle.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Maybe we could try another algorithm for scalable svd() :
> > >>
> > >> Sure. But before you do that, it may be worth our while to understand
> > what
> > >> is exactly misleading about the paper that Imran talks about.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> -Nakul
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi <
> > >> janardhan.pulivarthi@gmail.com> wrote:
> > >>
> > >> > Hi Nakul,
> > >> >
> > >> > Can you help me understand how gpu implementations help scale it.
> > >> Whether
> > >> > the user always have GPUs in use when using this fn or is it an
> > optional
> > >> > feature.
> > >>
> > >>
> > >> > The problem with implementing the scalable svd() in dml is due to
> the
> > >> spark
> > >> > backend issues, otherwise there is no problem scaling w/o a local
> > svd()
> > >> > function.
> > >> >
> > >> > May be we could try another algorithm for the scalable svd( ), if
> the
> > >> > present algorithm is misleading as Imran Younus pointed out.
> > >> >
> > >> > Thank you,
> > >> > Janardhan
> > >> >
> > >>
> > >
> > >
> >
>