Dear Mahout developers,
I would like to use the
org.apache.mahout.math.decomposer.hebbian.HebbianSolver in order to get a
singular value decomposition (mahout 0.4).
In the paper "Generalized Hebbian Algorithm for Latent Semantic Analysis" (at
http://www.dcs.shef.ac.uk/~genevieve/gorrell_webb.pdf) the algorithm is
described. In the documentation of the HebbianSolver it is described as an
iterative, sparse, singular value decomposition solver.
As far as I understood the HebbianSolver is designed for eigendecomposition,
but like any such algorithm, getting singular vectors out of it is immediate
(since SVD and eigenvalue decomposition are related to each other).
Applied process:
 solverHebbian.solve(matrix, desired_rank) > result:
state.getCurrentEigens() (rows are eigenvectors!) and
state.getCurrentEigenValues()
 get singular values: singularValue = Math.sqrt(eigenValue) > generate sigma
and sigmaInverse
 U = state.getCurrentEigens().transpose()
 V = (a.transpose().times(U)).times(sigmaInverse)
 approximated_a = (U.times(sigma)).times(V.transpose())
 calculate frobenius norm of a and approximated_a (in order to see how good
the approximation is)
I tested the HebbianSolver with several examples.
And my question is: which matrix is supposed to be the input matrix for the
HebbianSolver: A or A*A^t? Since the result are eigenvectors and eigenvalues I
would expect A*A^t.
I compared the results for the examples (derived with the HebbianSolver) with
the results derived with R (package corpcor).
1) If A has to be the input matrix ...
... example 1: the process above gets a result which is very similar to the
result derived by R (U, Sigma and V).
... example 2: the process above gets a result which is at least similar to
the result derived by R (U, Sigma and V).
... example 3: A would lead to an infinite loop, thus take A^t. But then:
CardinalityException while calculating V.
2) If A*A^t has to be the input matrix ...
... example 1: U derived with the process above is the the same U as derived
with R. The singular values derived with the process above are the squares
from the singular values derived by R. V isn't equal.
... example 2: similar to results for example 1
... example 3: similar to results for example 1
Annotation:
* The method getRandomStartingIndex(Matrix corpus, Matrix eigens) produces an
infinite loop when every row of the considered matrix has less than 5 values
(see at 1) example 3).
* I could post my Java Code as well if that would help.
Thanks in advance.
Stefanie
Here is the R code with the mentioned examples:

library(corpcor)
###########
# example 1
###########
a<matrix(c(4.42282138, 1.51744077, 0.07690571, 0.93650042, 2.19609401,
1.51744077, 1.73849477, 0.11856149, 0.76555191, 1.3673608,
0.07690571, 0.11856149, 0.55065932, 1.72163263, 0.2283693,
0.93650042, 0.76555191, 1.72163263, 0.09470345, 1.16626194,
2.19609401, 1.3673608, 0.2283693, 1.16626194, 0.37321311 ), 5, 5, byrow =
TRUE)
s = fast.svd(a)
D < diag(s$d)
app_a < s$u %*% D %*% t(s$v)
app_d < t(s$u) %*% a %*% s$v
s
###########
# example 2
###########
a<matrix(c(22, 52, 1, 93, 70, 33,
5, 83, 38, 85, 91, 63,
68, 3, 7, 53, 76, 76,
68, 5, 42, 9, 26, 99,
93, 53, 69, 65, 5, 37,
38, 67, 59, 42, 74, 25), 6, 6, byrow = TRUE)
s = fast.svd(a)
D < diag(s$d)
app_a < s$u %*% D %*% t(s$v)
app_d < t(s$u) %*% a %*% s$v
s
###########
# example 3
###########
a<matrix(c(4,4,5,
4,5,5,
3,3,2,
4,5,4,
4,4,4,
3,5,4,
4,4,3,
2,4,4,
5,5,5), 9, 3, byrow = TRUE)
s = fast.svd(a)
D < diag(s$d)
app_a < s$u %*% D %*% t(s$v)
app_d < t(s$u) %*% a %*% s$v
s


