Dear Mahout developers,
I would like to use the
org.apache.mahout.math.decomposer.hebbian.HebbianSolver in order to get a
singular value decomposition (mahout 0.4).
In the paper "Generalized Hebbian Algorithm for Latent Semantic Analysis" (at
http://www.dcs.shef.ac.uk/~genevieve/gorrell_webb.pdf) the algorithm is
described. In the documentation of the HebbianSolver it is described as an
iterative, sparse, singular value decomposition solver.
As far as I understood the HebbianSolver is designed for eigen-decomposition,
but like any such algorithm, getting singular vectors out of it is immediate
(since SVD and eigenvalue decomposition are related to each other).
Applied process:
- solverHebbian.solve(matrix, desired_rank) -> result:
state.getCurrentEigens() (rows are eigenvectors!) and
state.getCurrentEigenValues()
- get singular values: singularValue = Math.sqrt(eigenValue) -> generate sigma
and sigmaInverse
- U = state.getCurrentEigens().transpose()
- V = (a.transpose().times(U)).times(sigmaInverse)
- approximated_a = (U.times(sigma)).times(V.transpose())
- calculate frobenius norm of a and approximated_a (in order to see how good
the approximation is)
I tested the HebbianSolver with several examples.
And my question is: which matrix is supposed to be the input matrix for the
HebbianSolver: A or A*A^t? Since the result are eigenvectors and eigenvalues I
would expect A*A^t.
I compared the results for the examples (derived with the HebbianSolver) with
the results derived with R (package corpcor).
1) If A has to be the input matrix ...
... example 1: the process above gets a result which is very similar to the
result derived by R (U, Sigma and V).
... example 2: the process above gets a result which is at least similar to
the result derived by R (U, Sigma and V).
... example 3: A would lead to an infinite loop, thus take A^t. But then:
CardinalityException while calculating V.
2) If A*A^t has to be the input matrix ...
... example 1: U derived with the process above is the the same U as derived
with R. The singular values derived with the process above are the squares
from the singular values derived by R. V isn't equal.
... example 2: similar to results for example 1
... example 3: similar to results for example 1
Annotation:
* The method getRandomStartingIndex(Matrix corpus, Matrix eigens) produces an
infinite loop when every row of the considered matrix has less than 5 values
(see at 1) example 3).
* I could post my Java Code as well if that would help.
Thanks in advance.
Stefanie
Here is the R code with the mentioned examples:
---------------------------------------------------------------------------------------------------
library(corpcor)
###########
# example 1
###########
a<-matrix(c(4.42282138, 1.51744077, 0.07690571, 0.93650042, 2.19609401,
1.51744077, 1.73849477, -0.11856149, 0.76555191, 1.3673608,
0.07690571, -0.11856149, 0.55065932, 1.72163263, -0.2283693,
0.93650042, 0.76555191, 1.72163263, 0.09470345, -1.16626194,
2.19609401, 1.3673608, -0.2283693, -1.16626194, -0.37321311 ), 5, 5, byrow =
TRUE)
s = fast.svd(a)
D <- diag(s$d)
app_a <- s$u %*% D %*% t(s$v)
app_d <- t(s$u) %*% a %*% s$v
s
###########
# example 2
###########
a<-matrix(c(22, 52, 1, 93, 70, 33,
5, 83, 38, 85, 91, 63,
68, 3, 7, 53, 76, 76,
68, 5, 42, 9, 26, 99,
93, 53, 69, 65, 5, 37,
38, 67, 59, 42, 74, 25), 6, 6, byrow = TRUE)
s = fast.svd(a)
D <- diag(s$d)
app_a <- s$u %*% D %*% t(s$v)
app_d <- t(s$u) %*% a %*% s$v
s
###########
# example 3
###########
a<-matrix(c(4,4,5,
4,5,5,
3,3,2,
4,5,4,
4,4,4,
3,5,4,
4,4,3,
2,4,4,
5,5,5), 9, 3, byrow = TRUE)
s = fast.svd(a)
D <- diag(s$d)
app_a <- s$u %*% D %*% t(s$v)
app_d <- t(s$u) %*% a %*% s$v
s
---------------------------------------------------------------------------------------------------
--
|