Hi,
2012/12/1 Konstantin Berlin <kberlin@gmail.com>
> Hi,
>
> Now that I have some time, let me try to make my case clearly. First I
> want to say that this is not some attack on the automaticdifferentation
> package. I love automaticdifferentation and symbolic packages. I
> personally cannot compute a derivative without a computer for the life of
> me. And I am also glad we are finally starting to agree on something :)
>
> This discussion is about figuring out how an incorrect framework and
> storage affects the performance of an optimization. That is why I am so
> worried.
>
> So lets start with the basics. A Newton method must compute a descent step
> by solving a linear equation
>
> H*p = g, (1)
>
> where H is the Hessian, g is the gradient, and p is the desired descent
> step. What I am about to say also holds for nonlinear least squares
> method, where Hessian is approximated using the Jacobian as
>
> H \approx J^T*J+\lambda I.
>
> Now, if you are not solving a simple optimization problem that you keep
> giving examples for, you can easily have a Hessian be a very large matrix,
> like 1000x1000 or larger. Now, you better hope that you are storing H
> using your linear algebra framework, otherwise eq. (1) computation is going
> to take a while.
> This is actually a very active area of research, and that is why having
> sparse linear algebra (aren't you removing this? ;) ) and iterative solvers
> is important to a lot of people.
>
A few months ago, we started a thread on this issue (on the users ML). It
got no answers! I am personally not happy with removing the support for
sparse vectors/matrices, but the truth is we didn't see a way to achieve
the same degree of correctness as say array based matrices and vectors.
As far as I am concerned, though, this is still an open question, and if
you have ideas about these issues, we would be glad to here them!
>
> What you are proposing as good OO style is that if someone has a function
> that they want to optimize, they need to take what is probably already a
> RealMatrix or [][], and create 1000x1000 DerivativeStructure objects, that,
> within the next 10 lines in the optimizer, are going to be converted back
> to a RealMatrix? Not only have you just fragmented your heap space, but
> your garbage collector
> is going crazy, and you are wasting an incredible amount of memory. Now
> imagine if your Jacobian is actually very simple to compute but large, then
> this whole conversion back and forth actually takes much longer than the
> function evaluation.
>
> So, why exactly are you insisting on taking this performance penalty? You
> claim that the optimizer can work better if it gets a DerivativeStructure,
> why? Where is the fact that you are holding a DerivativeStructure come into
> effect for a Newton method? Can you provide any literature in that regard?
> The classical optimization bottleneck, if not the actual function
> evaluation, is eq. (1). The optimization methodology is build on top of
> years of research in computational linear algebra concepts, and does not
> care how the matrices are actually computed. Efficient memory usage and
> speed is critical here because in Newton optimizations eq. (1) is evaluated
> thousands of times. The goal of the Jacobian is not to store derivatives it
> is to store a matrix of numbers that allows you to quadratically
> approximate your target function.
>
> You guys are focusing on people using finite differences and trying to
> optimize a newton method to use finite differences. This is not what the
> main purpose of a newton method is. If you want a method that dynamically
> adjusts finite differences step size, you should switch to BOBYQA, which
> was designed for this case.
>
> Derivatives can be computed by a computer using a symbolic toolbox a
> priori (something that I was referring to when I accidentally said
> autodifferenation). They can also be efficiently simplified by that
> toolbox before being pasted back into you program. Autodiff could also be
> an important tool for people if their problem is simple or they are not
> concerned with optimal efficiency . This can easily be handled by a wrapper
> and has nothing to do with Newton optimization. If you want to talk about
> proper OO design and internal simplification then you should focus on being
> able to pass a linear solver into your optimization method. This way you
> allow people to use iterative and sparse solvers when needed. If you are
> concerned about people getting derivatives wrong, you can do what MATLAB
> does, which is to add an option to check the derivatives using finite
> differences when debugging.
>
> This brings me to my second issue. There are important problems where
> computation of the derivatives combined with the actual function evaluation
> is *significantly* faster than computing each alone. I am not clear why I
> am hitting such resistance with this. For example, you are doing some sort
> of a simulation, which can be trivially adjusted in the end to give a
> derivative or a function value. A very real example of this is a Fast
> Multipole Method, which takes time to compute a series expansion of the
> function, but the resulting series expansion can be analytically
> differentiated.
>
> What I suggested as function interfaces was just an initial quick
> suggestion that I would be happy for anyone to change in a way that
> everyone feels positive about. I just don't want my second issue to be
> ignored like a nonissue.
>
> This is certainly not a nonissue.
Thanks for *your* time,
Sébastien
> Thanks for your time,
> Konstantin
>
> On Nov 30, 2012, at 5:15 PM, Gilles Sadowski <gilles@harfang.homelinux.org>
> wrote:
>
> >>>
> >>> I don't know if people are confused about autodifferentation, I
> >>> think most people working in numerical analysis are very well aware
> >>> of what it does. The issue here is that it is a completely separate
> >>> subject from optimizations. In a proper OO design you would not mix
> >>> the two together. Computation of the derivatives is a separate
> >>> component from optimization. Optimization only assumes that you can
> >>> compute one, it shouldn't care or force you two compute one in a
> >>> specific way. That's bad design. Everyone component should only have
> >>> necessary and sufficient requirements for use. Automatic
> >>> differentiation is not necessary for optimization so it should not
> >>> exist in the interface.
> >>
> >> You are right.
> >>
> >>>
> >>> I would like to add, that if my function is simple enough, I am very
> >>> capable of running autdiff package a priori myself. The real need for
> >>> finitedifference is when you cannot analytically compute the
> >>> derivative. Though I think having a wrapper function around autdiff
> >>> package is a nice tool to help developers quickly write code, if they
> >>> prefer to have quicker development time but slightly slower code.
> >>
> >>>
> >>> If someone can explain to me why autodiff package should be directly
> >>> forced or even be provided as alternative into optimization package
> >>> when a simple wrapper function can do this task while maintaining
> >>> much cleaner OO design, please tell me.
> >>
> >> You just found the reason and explained it by yourself: it was bad
> >> design and I am the one to blame for this. I was (and still am) stupid.
> >
> > I'm not convinced that it was bad design. Maybe I'm completely off now:
> > I thought that the purpose was internal simplification. I don't think
> > that it can be inherently wrong to store any sort of derivatives (be it
> > gradient or Jacobian) into a structure aimed at storing... partial
> > derivatives (i.e. "DerivativeStructure").
> >
> > Please let me know whether the problem is more indepth than what I
> > described previously (allowing the current way to provide the gradient
> > and Jacobian).
> >
> > I would nevertheless have suggested to delete the method
> > MultivariateFunction partialDerivative(int k);
> > from "DifferentiableMultivariateFunction".
> >
> >>
> >> So I suggest we disconnect differentiation from optimization, but in a
> >> way that would let users decide how they provide the differentials. This
> >> means I would not like to reintroduce the former interfaces.
> >>
> >> What about having the optimize() methods taking two arguments function,
> >> a MultivariateFunction for the value and a MultivariateVectorFunction
> >> for the gradient? It would be up to the user to select how they compute
> >> each function. They could use direct coding, they could use a finite
> >> differences wrapper around the first function to implement the second,
> >
> >
> >
> >> they could use our differentiation framework and put one wrapper to
> >> extract the value and another one to extract the gradient.
> >
> > That would also be a nice example of usage of the differentiation
> > framework.
> >
> >> What do you think?
> >
> > Yes. Now I got it (I think). IIUC, we could even introduce the new
> interface
> > before 3.1, and deprecated old the older ones (as you wanted to do
> anyways,
> > IIRC).
> >
> > I'll give it a try in the next days. OK?
> >
> >
> > Gilles
> >
> > 
> > To unsubscribe, email: devunsubscribe@commons.apache.org
> > For additional commands, email: devhelp@commons.apache.org
> >
>
>
> 
> To unsubscribe, email: devunsubscribe@commons.apache.org
> For additional commands, email: devhelp@commons.apache.org
>
>
