I've looked into improving performance further, but it seems any further
improvements will need big API changes for memory management.
Currently using GaussNewton with Cholesky (or LU) requires 4 matrix
allocations _each_ evaluation. The objective function initially
allocates the Jacobian matrix. Then the weights are applied through
matrix multiplication, allocating a new matrix. Computing the normal
equations allocates a new matrix to hold the result, and finally the
decomposition allocates it's own matrix as a copy. With QR there are 3
matrix allocations each model function evaluation, since there is no
need to compute the normal equations, but the third allocation+copy is
larger. Some empirical sampling data I've collected with the jvisualvm
tool indicates that matrix allocation and copying takes 30% to 80% of
the execution time, depending on the dimension of the Jacobian.
One way to improve performance would be to provide preallocated space
for the Jacobian and reuse it for each evaluation. The
LeastSquaresProblem interface would then be:
void evaluate(RealVector point, RealVector resultResiduals, RealVector
resultJacobian);
I'm interested in hearing your ideas on other approaches to solve this
issue. Or even if this is an issue worth solving.
Best Regards,
Evan

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
