spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (SPARK-1262) Numerical drift in computation of matrix inverse leads to invalid results in ALS
Date Wed, 21 Jan 2015 06:58:35 GMT

     [ https://issues.apache.org/jira/browse/SPARK-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated SPARK-1262:
-----------------------------
    Comment: was deleted

(was: That makes sense. There may still be an interesting point here. What would be interested
to know is whether the size of |Ax-B| is consistently larger or smaller for one solver or
another. If you find the QR decomposition has smaller differences and gives anecdotally better
results, that would be a compelling reason to consider the different decomposition. There
was another issue regarding solvePositive and mysteriously near-singular A that might be related.)

> Numerical drift in computation of matrix inverse leads to invalid results in ALS
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-1262
>                 URL: https://issues.apache.org/jira/browse/SPARK-1262
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>            Reporter: Michael Allman
>
> In what follows I cannot offer an expert analysis and remedy, however I will describe
the problem I'm seeing and the strategy I've used to mitigate it.
> The ALS {{updateBlock()}} method includes a call to {{Solve.solvePositive()}} from JBlas.
Generally speaking, when we call {{solvePositive(A, B)}} for symmetric, positive definite
{{A}}, this method computes {{x}} from the matrix equation {{Ax = B}}. Or, in other words,
it computes {{A}}^-1^{{B}}. As mentioned, one of the preconditions on A is that it be symmetric.
In ALS, we call this method on {{fullXtX}} or {{fullXtX.add(YtY.value.get)}}, both of which
should be symmetric. However, for implicit ALS and rank > 1, some kind of imprecision in
the computation of this inverse tends to make successive values of {{fullXtX.add(YtY.value.get)}}
less and less symmetric. From my experience this can and does alter the model produced by
ALS significantly, leading to very different and counterintuitive user-item recommendations
compared to a solution in which this asymmetry does not exist.
> An approach I've seen taken against this problem from the Oryx codebase is to cast each
value in the vector returned from {{Solve.solvePositive()}} to a float. That has worked for
me.
> I've also tried alternative implementations of a linear system solver. The solver that
Oryx uses, {{RRQRDecomposition}} from commons math v3, seems to lead to better solutions,
but still drifts unless the results are cast to floats.
> The Colt numerical computing libraries include a solver called {{QRDecomposition}}, which
may be superior to the one used in JBlas. However, I haven't tested it.
> I'm working on a unit test to exercise this bug. I discovered it by "tracing" the behavior
of {{ALS.scala}} with various logging statements inserted into the code.
> If nothing else, add a line before the calls to {{Solve.solvePositive()}} in {{updateBlock()}}
to validate that {{fullXtX}} or {{fullXtX.add(YtY.value.get)}} are symmetric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message