On 12/01/2012 01:42 AM, Konstantin Berlin wrote:
> Hi,
>
> Now that I have some time, let me try to make my case clearly. First I want to say that
this is not some attack on the automaticdifferentation package. I love automaticdifferentation
and symbolic packages. I personally cannot compute a derivative without a computer for the
life of me. And I am also glad we are finally starting to agree on something :)
>
> This discussion is about figuring out how an incorrect framework and storage affects
the performance of an optimization. That is why I am so worried.
>
> So lets start with the basics. A Newton method must compute a descent step by solving
a linear equation
>
> H*p = g, (1)
>
> where H is the Hessian, g is the gradient, and p is the desired descent step. What I
am about to say also holds for nonlinear least squares method, where Hessian is approximated
using the Jacobian as
>
> H \approx J^T*J+\lambda I.
>
> Now, if you are not solving a simple optimization problem that you keep giving examples
for, you can easily have a Hessian be a very large matrix,
> like 1000x1000 or larger. Now, you better hope that you are storing H using your linear
algebra framework, otherwise eq. (1) computation is going to take a while.
> This is actually a very active area of research, and that is why having sparse linear
algebra (aren't you removing this? ;) ) and iterative solvers is important to a lot of people.
>
> What you are proposing as good OO style is that if someone has a function that they want
to optimize, they need to take what is probably already a RealMatrix or [][], and create 1000x1000
DerivativeStructure objects, that,
> within the next 10 lines in the optimizer, are going to be converted back to a RealMatrix?
Not only have you just fragmented your heap space, but your garbage collector
> is going crazy, and you are wasting an incredible amount of memory. Now imagine if your
Jacobian is actually very simple to compute but large, then this whole conversion back and
forth actually takes much longer than the function evaluation.
>
> So, why exactly are you insisting on taking this performance penalty? You claim that
the optimizer can work better if it gets a DerivativeStructure, why? Where is the fact that
you are holding a DerivativeStructure come into effect for a Newton method? Can you provide
any literature in that regard? The classical optimization bottleneck, if not the actual function
evaluation, is eq. (1). The optimization methodology is build on top of years of research
in computational linear algebra concepts, and does not care how the matrices are actually
computed. Efficient memory usage and speed is critical here because in Newton optimizations
eq. (1) is evaluated thousands of times. The goal of the Jacobian is not to store derivatives
it is to store a matrix of numbers that allows you to quadratically approximate your target
function.
>
> You guys are focusing on people using finite differences and trying to optimize a newton
method to use finite differences. This is not what the main purpose of a newton method is.
If you want a method that dynamically adjusts finite differences step size, you should switch
to BOBYQA, which was designed for this case.
>
> Derivatives can be computed by a computer using a symbolic toolbox a priori (something
that I was referring to when I accidentally said autodifferenation). They can also be efficiently
simplified by that toolbox before being pasted back into you program. Autodiff could also
be an important tool for people if their problem is simple or they are not concerned with
optimal efficiency . This can easily be handled by a wrapper and has nothing to do with Newton
optimization. If you want to talk about proper OO design and internal simplification then
you should focus on being able to pass a linear solver into your optimization method. This
way you allow people to use iterative and sparse solvers when needed. If you are concerned
about people getting derivatives wrong, you can do what MATLAB does, which is to add an option
to check the derivatives using finite differences when debugging.
>
> This brings me to my second issue. There are important problems where computation of
the derivatives combined with the actual function evaluation is *significantly* faster than
computing each alone. I am not clear why I am hitting such resistance with this. For example,
you are doing some sort of a simulation, which can be trivially adjusted in the end to give
a derivative or a function value. A very real example of this is a Fast Multipole Method,
which takes time to compute a series expansion of the function, but the resulting series expansion
can be analytically differentiated.
>
> What I suggested as function interfaces was just an initial quick suggestion that I would
be happy for anyone to change in a way that everyone feels positive about. I just don't want
my second issue to be ignored like a nonissue.
I think the discussion on these things just started, and imho the guys
here are really open to discuss changes and even admit mistakes or bad
design, it just takes time to digest feedback.
Also the current design may be based on wrong assumptions (e.g. for too
specific usecases), whereas the general case is handled in a
suboptimal way, so we have to adapt.
just my 2 cents.
Thomas

To unsubscribe, email: devunsubscribe@commons.apache.org
For additional commands, email: devhelp@commons.apache.org
