mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Using SVD-conditioned matrix
Date Mon, 17 Sep 2012 05:29:26 GMT
Lance,

There is still a confusion.

Strictly speaking, the elements of a sub-space are a sub-set of the
original space.  Vectors projected into the reduced representation are
one-to-one and onto the sub-space spanned by rows of V_k.  In fact, this is
basically just a rotation and/or reflection.

When you actually come to compute with these entities, however, it is
important to be clear which representation you are using.

To be totally anal about this, and assuming that A is n x m and n > m > k,
the rows of A span a sub-space of R^m.  The V matrix in the full SVD form a
basis of this space which is normally referred to as the span of A.  The
right hand singular vectors of the reduced SVD V_k is a basis of a
k-dimensional subspace of span A which is the most important sub-space in a
least squares sense.

Any m-dimensional point in this k-dimensional sub-space of R^m can be
projected using V_k onto R^k and can be projected back using V_k'.  Any
point in span A not in the k-dimensional subspace of R^m will be projected
to the nearest point (in L_2 sense) of the k-dimensional sub-space by V_k
V_k'.

So in LSA terms, you can convert any vector (old or new) into the latent
factor space by right multiplying by V_k.  Dot products in that space have
the same value as the dot products in the k-dimensional space.  If you use
the diagonal weights, then you can approximate the vector product x A' A y
in a least squares sense.

More in-line.

On Sun, Sep 16, 2012 at 6:43 PM, Lance Norskog <goksron@gmail.com> wrote:

> > The original question (possibly misphrased and not in so many words)
> asked
> for a way to project any vector into the space spanned by A.
>
> Yes, it was not completely worded. The interpretation above is backwards.
> There is a subspace sA. In this subspace there is a set of vectors.
> These vectors are rows in a matrix A. We condition matrix A to create
> matrix Ak.  All row vectors in A are now in a new subspace sAk.
>
> Now, there is a new row vector in subspace sA. What matrix projects
> this row vector into subspace sAk?
>

V_k.

You can spot this quickly by noting that the new row vector will be 1 x m
in size.  U S V' = A which is n x m.  Thus U is n x k, S is k x k and V is
m x k.  Only V has the right shape to be multiplied by the new row vector
and the result will be 1 x k as desired.

One use case is new user recommendations. The new user gives a small
> set of items. We would like to find other users. For simplicity, take
> users as rows and items as columns in A. We want a conditioned Ak
> where we can take a new user with a few preferences, and do a cosine
> distance search among other users. With the ability to project a new
> user into a conditioned space, we can find other users who may not
> have any of the new user's items. Using a conditioned Ak increases
> recall at the expense of precision. The more conditioned, the more
> recall.
>

The use of the term "conditioned" is a bit misleading.  There is a concept
called condition-number with matrices that refers to the ratio of the
largest to smallest singular values.  When you set some of the singular
values to 0, you have explicitly set the condition number to infinity.

I don't even think that your claim that decreasing k increases recall is
correct.


> On Sun, Sep 16, 2012 at 4:11 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> > On Sun, Sep 16, 2012 at 1:49 PM, Sean Owen <srowen@gmail.com> wrote:
> >
> >> Oh right. It's the columns that are orthogonal. Cancel that.
> >>
> >> But how does this let you get a row of Uk out of a new row in Ak? Uk
> >> != Uk * Uk' * Ak
> >>
> >
> > Well, actually, since a column of A is clearly in the space spanned by A
> so
> > if you take any such column, U_k U_k'  will send it back to where it came
> > from.  Thus,
> >
> >   A_k = U_k U_k' A_k
> >
> > If you want to talk about rows, then use V_k as with this:
> >
> >   A_k = A_k V_k V_k'
> >
> >
> > Forgetting Sk, Uk = Ak * Vk, and I miss how these are equivalent so I
> >> misunderstand the intent of the expression.
> >>
> >
> > Forgetting S_k is a bit dangerous here.  It is true that
> >
> >     U_k S_k = A_k V_k
> >
> > But, of course, right multiplying by V_k' gives us the identity above.
> >
> > I think that the real confusion is that I am talking about projecting
> back
> > into span A and you are talking about expressing things in terms of the
> > latent variables.
> >
> >
> >> On Sun, Sep 16, 2012 at 8:55 PM, Ted Dunning <ted.dunning@gmail.com>
> >> wrote:
> >> > U_k ' U_k = I
> >> >
> >> > U_k U_k ' != I
> >>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message