systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Plourde <mplou...@cadent.tv>
Subject RE: svd( ) implementation
Date Mon, 31 Jul 2017 13:13:04 GMT
unsubscribe

-----Original Message-----
From: Imran Younus [mailto:imranyounus@gmail.com] 
Sent: Friday, July 28, 2017 2:22 PM
To: dev@systemml.apache.org
Subject: Re: svd( ) implementation

Just to clarify one thing. For QR based, method, you can assume that R matrix is small enough
to fit on driver memory and them perform SVD on the driver. That means your actual matrix
has to tall-skinny matrix.

imran

On Fri, Jul 28, 2017 at 11:15 AM, Imran Younus <imranyounus@gmail.com>
wrote:

> Janardhan,
>
> The papers you're referring may not be relevant. The first paper, as 
> far as I can tell, is about updating an existing svd decomposition as 
> new data comes in. The 3rd paper in this list is the one I used, but 
> that method is not good.
>
> There is also a method that uses QR decomposition and then calculates 
> SVD from R matrix. Please have a look at equation 1.3 in this paper:
>
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.115&rank=1
>
> I think this is worth trying out. The distributed QR is already 
> implemented in SystemlML, so it may quick to try out.
>
> imran
>
>
>
> On Fri, Jul 28, 2017 at 10:10 AM, Janardhan Pulivarthi < 
> janardhan.pulivarthi@gmail.com> wrote:
>
>> Hi Nakul & all the committers,
>>
>> Till now I am half way through the literature. But, for now a couple 
>> of things to mention, in SVD there are three stages
>>   1. Bidiagonal reduction step
>>   2. Computation of the singular values
>>   3. Computation of the singular vectors
>>
>> of these three, The* Bidiagonal reduction* step is very expensive, so 
>> is our focus on this( when considering GPU, at times where handling 
>> with CPU is infeasible).
>>
>> About literature:
>>
>>    - I took some time to go through " A Stable and Fast Algorithm for
>>    Updating the Singular Value Decomposition" by "Gu & Stanley", to 
>> understand
>>    the numerical stability and round-off errors when we are 
>> partitioning the
>>    matrix in this distributed algorithm. The author has assured that each
>>    component computed will be of high absolute accuracy. And also, the
>>    properties that the resultant matrix support do not have any 
>> conflicts with
>>    parent matrix. [pdf
>>    <http://www.cs.yale.edu/publications/techreports/tr966.pdf>]
>>
>>
>>    - "High performance bidiagonal reduction using the tile algorithms on
>>    homogeneous multicore clusters ", by "Ltaief et. al", this paper has
>>    focused on the first stage mainly and has discussed a good about tile
>>    algorithms and their runtime implementations.(although off-topic, 
>> I read
>>    this just to understand.) [pdf
>>    <http://www.netlib.org/lapack/lawnspdf/lawn247.pdf>]
>>
>>
>>    -  "A distributed and incremental svd algorithm for agglomerative data
>>    analysis on large networks", by "Iwen & Ong", *Please go through* the
>>    (a). TABLE 1, TABLE 2 . (b). APPENDIX A. RAW DATA FROM NUMERICAL
>>    EXPERIMENTS. [pdf <https://arxiv.org/pdf/1601.07010.pdf>]
>>
>> Thanks,
>>
>> Janardhan
>>
>> On Wed, Jul 26, 2017 at 12:29 AM, Nakul Jindal <nakul02@gmail.com> wrote:
>>
>> > Hi Janardhan,
>> >
>> > The images you've used as attachments haven't reached my inbox.
>> > Could you please send them to me directly, rather than through the 
>> > dev mailing list.
>> > (Or upload it to a image hosting site like imgur and paste the 
>> > links in
>> the
>> > email)
>> >
>> > I would like to point out that my knowledge of machine learning is
>> limited.
>> > Still, how would you want to test the algorithm?
>> >
>> >
>> > Sparse matrices in SystemML (in Spark Execution Mode) Sparse matrix 
>> > support in SystemML is implemented at a block level. Each 
>> > (potentially giant) matrix is stored as blocks (in Spark execution
>> mode).
>> > The matrix itself doesn't store knowledge of whether it is sparse 
>> > or
>> not.
>> > Each block does. Each block of this giant matrix can be stored in 
>> > dense
>> or
>> > spare format, depending on how many non-zeroes are in that block.
>> > The sparsity of a block is recomputed every so often, and the block 
>> > representation can be converted from dense to sparse and vice-versa.
>> > When operations are performed on blocks, internally, a sparse aware 
>> > operation maybe performed, depending on how the blocks are stored.
>> >
>> > (One of the other contributors can clarify, if I've missed 
>> > something or have said something wrong).
>> >
>> > Given this context, can you please think about whether missing 
>> > sparse matrix support is still a problem?
>> >
>> >
>> > -Nakul
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jul 25, 2017 at 11:14 AM, Janardhan Pulivarthi < 
>> > janardhan.pulivarthi@gmail.com> wrote:
>> >
>> > > Hi Nakul,
>> > >
>> > > Thanks for explaining me about pros and cons of the two approaches.
>> For
>> > > now, I have gone through the paper carefully over a couple of 
>> > > days and found the following interesting things.
>> > >
>> > > 1. This is the algorithm we implemented.
>> > >
>> > > ​
>> > > 2. In the above algorithm the input matrix A is approximated to
>> another
>> > > matrix B with the following relation with the error of chi(p, i) 
>> > > [ as
>> > shown
>> > > in (3) ] which the author argues will be in an acceptable limit. 
>> > > So,
>> we
>> > can
>> > > go with this algorithm.
>> > >
>> > >
>> > > But, one bad thing is that author is not sure about whether the
>> algorithm
>> > > supports the sparse matrices or not. So, we may need to test it here.
>> > >
>> > > For the time being we need to test the present algorithm 
>> > > implemented
>> by
>> > > Imran Younus again. Can you help me with the testing?
>> > >
>> > > Thanks,
>> > > Janardhan
>> > > ​
>> > >
>> > > On Fri, Jul 21, 2017 at 6:07 AM, Nakul Jindal <nakul02@gmail.com>
>> wrote:
>> > >
>> > >> Hi Janardhan,
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> How will GPU implementation help scale distributed SVD:
>> > >>
>> > >> Imran implemented an algorithm he found out about in the paper 
>> > >> "A Distributed and Incremental SVD Algorithm for Agglomerative 
>> > >> Data
>> > Analysis
>> > >> on Large Networks" (
>> > >> https://github.com/apache/systemml/pull/273/files#diff-488f0
>> > >> 6e290f7a54db2e125f7bc608971R27
>> > >> ).
>> > >> The idea there was to build up a distributed SVD using 
>> > >> invocations of
>> > svd
>> > >> on your local machine. He tried to achieve the multi-level
>> parallelism
>> > >> through the parfor construct.
>> > >> Each local invocation of svd was done using the Apache Commons 
>> > >> Math library.
>> > >> If each invocation of this local svd can instead be done on a 
>> > >> GPU,
>> the
>> > >> overall wall time for the distributed version would be decreased.
>> > >>
>> > >> Users may not always have a GPU. In that case, the svd falls 
>> > >> back to
>> the
>> > >> Apache Comons Math implementation. But if they do and if we have 
>> > >> a
>> "svd"
>> > >> builtin function, then it would be easier to take advantage of 
>> > >> the
>> GPU.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> Problem with scalable svd in dml is due to spark backed issues,
>> > otherwise
>> > >> there is not problem scaling w/o a local svd():
>> > >>
>> > >> There maybe spark backend issues and more may come to light and 
>> > >> more workloads are executed on SystemML.
>> > >> For any given operation - we can implement it as a DML bodied
>> function
>> > or
>> > >> a
>> > >> builtin function.
>> > >> For DML Bodied functions:
>> > >> Pros:
>> > >> - The SystemML optimizer can be applied to it
>> > >> - Distribution of SVD is then taken care of by SystemML. It will
>> > generate
>> > >> and run the spark primitives needed.
>> > >>
>> > >> Cons:
>> > >> - Implementing SVD, whether in DML or C, is a fair amount of 
>> > >> work
>> > >> - There would not be a straightforward call to the svd gpu library.
>> In
>> > >> fact, each of the linear algebra primitives would be accelerated 
>> > >> on
>> the
>> > >> GPU, but not the entire operation itself. This would involve 
>> > >> many
>> more
>> > JNI
>> > >> calls.
>> > >>
>> > >> For builtin functions:
>> > >> Pros:
>> > >> - Use of GPU libraries (cuSolver) and CPU libraries (Apache 
>> > >> Commons
>> > Math)
>> > >> can be made, these are already optimized (in case of the GPU)
>> > >> - If a better SVD implementation is available via a library, 
>> > >> that can easily be plugged in.
>> > >>
>> > >> Cons:
>> > >> - Would have to come up with an algorithm to implement 
>> > >> distributed
>> SVD
>> > >> with
>> > >> spark primitives
>> > >>
>> > >> Pick your battle.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> Maybe we could try another algorithm for scalable svd() :
>> > >>
>> > >> Sure. But before you do that, it may be worth our while to 
>> > >> understand
>> > what
>> > >> is exactly misleading about the paper that Imran talks about.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> -Nakul
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Thu, Jul 20, 2017 at 4:02 PM, Janardhan Pulivarthi < 
>> > >> janardhan.pulivarthi@gmail.com> wrote:
>> > >>
>> > >> > Hi Nakul,
>> > >> >
>> > >> > Can you help me understand how gpu implementations help scale
it.
>> > >> Whether
>> > >> > the user always have GPUs in use when using this fn or is it 
>> > >> > an
>> > optional
>> > >> > feature.
>> > >>
>> > >>
>> > >> > The problem with implementing the scalable svd() in dml is due

>> > >> > to
>> the
>> > >> spark
>> > >> > backend issues, otherwise there is no problem scaling w/o a 
>> > >> > local
>> > svd()
>> > >> > function.
>> > >> >
>> > >> > May be we could try another algorithm for the scalable svd( ),

>> > >> > if
>> the
>> > >> > present algorithm is misleading as Imran Younus pointed out.
>> > >> >
>> > >> > Thank you,
>> > >> > Janardhan
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>
Mime
View raw message