mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Understanding the SVD recommender
Date Thu, 10 Jun 2010 07:11:25 GMT
On Wed, Jun 9, 2010 at 12:08 PM, Richard Simon Just <
info@richardsimonjust.co.uk> wrote:
>
> Agreed. Just wanted to answer the question that had been left hanging for
> why Sarwar add the row average back. In fact to be complete, before the SVD
> they fill each null value with 'column average - row average'. But yeah,
> that would make for a much bigger computation.
>

I should note here, that if you really want to see the difference between
the fully centered SVD and the one where you only shift the nonzero entries,
the former is not *necessarily* a dense and therefore-out-of-reach
computation:

The meat of the Lanczos algorithm is repeatedly computing A . v, where A is
your big big matrix, and v is a (dense) vector which is getting closer and
closer to the biggest singular.  If you wanted to compute the singular
vectors of (A - B), where B_ij = r_i - c_j = (sum_{k=1..N} A_ik) -
(sum_{k=1..M} A_kj) is the matrix of row averages minus column averages,
then you can just do it in a couple of (still sparse) steps:

  [(A - B) . v]_i = [A . v]_i - [B . v]_i
                      = [A . v]_i - (sum_{k=1..N,j=1..M}A_ij v_k) +
(sum_{k=1...M,j=1...M}A_kj v_j)
                      = [A . v]_i - r_i * (sum_{k} v_k) + (sum_{k} (A .
v)_k)

The second term is just the row-average of row-i times v.zSum(), and the
last term is a constant for all values in the output vector, and is just (A
. v).zSum().
So no additional map-reduces are needed to incorporate this - it's just that
the Lanczos solver will need to be told about the row and column averages of
the input and optionally at each Lanczos iteration, modify the resultant
vector as above.

See if it makes a difference!

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message