commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luc Maisonobe <Luc.Maison...@free.fr>
Subject Re: [math] fluent/builder API for algorithms in 4.0 ?
Date Wed, 01 May 2013 09:41:29 GMT
Hi Thomas,

Le 30/04/2013 11:11, Thomas Neidhart a écrit :
> +1, looks really nice.
> 
> Would like to do the same for the Clusterer interface.

Sure. So it seems we have candidate to try this on optim, clusterer and
ode. This should give a broad testing field.

One of the pitfalls or devil in the details we can expect is we attempt
to do this while still having a class hierarchy for some aspects,
typically with an abstract class at the top.

Lets consider a withMaxEvaluation(int maxEval) implementation which may
probably be implemented once in the abstract class and simply inherited
in the specialized classes deeper in the hierarchy.

The top implementation should create an object of the appropriate type.
The simplest solution I think of is having a protected factory method.
So if we look at the abstract class and one of its derived class, we
should have something like:

 public abstract class AbstractAlgorithm {

    private final int maxEval;

    protected AbstractAlgorithm(final int maxEval) {
      this.maxEval = maxEval;
    }

    public AbstractAlgorithm() {
       // default setting
       this(Integer.MAX_VALUE);
    }

    protected abstract AbstractAlgorithm build(final int maxEval);

    public AbstractAlgorithm withMaxEval(final int maxEval) {
       return build(maxEval);
    }

    public int getMaxEval() {
      return maxEval;
    }

 }

 public class SpecializedAlgorithm
    extends AbstractAlgorithm implements ThresholdTunable {

   private final double threshold;

    protected SpecializedAlgorithm(final int maxEval,
                                   final double threshold) {
      super(maxEval);
      this.threshold = threshold;
    }

    public SpecializedAlgorithm() {
       // use default values all way up to the class root
       super();
       this.threshold = 1.0;
    }

    protected SpecializedAlgorithm build(final int maxEval) {
       // make sure we don't lose our local threshold
       // when the top level abstract class creates a new instance
       // with an updated maxEval
       return new SpecializedAlgorithm(maxEval, threshold);
    }

    public SpecializedAlgorithm withThreshold(final double threshold) {
       return new SpecializedAlgorithm(getMaxEval(), threshold);
    }

 }


Luc

> 
> 
> On Mon, Apr 29, 2013 at 6:40 PM, Luc Maisonobe <luc@spaceroots.org> wrote:
> 
>> Hi all,
>>
>> Since 2.x series, we have been struggling in several areas with respect
>> to algorithms API. The latest change was about optimizer, but it is only
>> one example among others (solvers, integration, ODE and maybe some parts
>> of statistics may be concerned by the proposal below).
>>
>> The various things we want to keep and which are not always compatible
>> with each others are :
>>
>>  1) simple use
>>  2) immutability
>>  3) good OO design
>>  4) compatible with reference algorithms implementations
>>  5) maintainable
>>  6) extensible
>>  7) backward compatibility
>>  8) probably many other characteristics ...
>>
>> 3) and 4) often don't work together. 1) 6) and 7) are difficult to
>> handle at once.
>>
>> If we look at optimizers, some progress have been with optimizers with
>> respect to extensibility and backward compatibility, but simple use was
>> clearly left behind as it is difficult to know which optimizer support
>> which feature as neither strong typing nor fixed arguments are used
>> anymore. However, keeping the older API would have prevented
>> extensibility as the combinatorial explosion of arguments increases as
>> features are added (and we still need to add several constraints types).
>>
>> If we look at ODE solvers, we are still using the original API from
>> mantissa, but when we add a new feature, we add more and more setters,
>> thus going farther and farther from immutability, and imposing some
>> unwritten scheduling between calls (for example when we set up
>> additional equations, we must also set up the initial additional state,
>> and the user must set up a way to retrieve the final additional state).
>>
>> If we look at solvers, we started with some parameters set up during the
>> call to solve while other were set up at construction time, but this
>> repartition has changed along time.
>>
>> So I would like to suggest a new approach, which has been largely
>> inspired by a recent discussion on the [CSV] component about the builder
>> API (see <http://markmail.org/thread/o3s2a5hyj6xh4nzj>), by an older
>> discussion on [math] about using fluen API for vectors (see
>> <http://markmail.org/message/2gmg6wnpm5p2splb>), and by a talk Simone
>> gave last year at ApacheCon Europe. The idea is to use fluent API to
>> build progressively the algorithm adding features one at a time using
>> withXxx methods defined in interfaces.
>>
>> As an example, consider just a few features used in optimization:
>> constraints, iteration limit, evaluations limits, search interval,
>> bracketing steps ... Some features are used in several optimizers, some
>> are specific to univariate solvers, some can be used in a family of
>> solvers ... Trying to fit everything in a single class hierarchy is
>> impossible. We tried, but I don't think we succeeded.
>>
>> If we consider separately each features, we could have interfaces
>> defined for each one as follows:
>>
>>   interface Constrainable<T extends Constrainable<T>>
>>             extends Optimizer {
>>     /** Returns a new optimizer, handling an additional constraint.
>>       * @param c the constraint to add
>>       * @return a new optimizer handling the constraint
>>       * (note that the instance itself is not changed
>>       */
>>     T withConstraint(Constraint c);
>>   }
>>
>> Basically they would  be used where OptimizationData is used today.
>> An optimizer that supports simple bounds constraints and max iterations
>> would be defined as :
>>
>>   public class TheOptimizer
>>          implements Optimizer,
>>                     Constrainable<TheOptimizer>,
>>                     IterationLimited<TheOptimizer> {
>>
>>     private final int maxIter;
>>     private final List<Constraint> constraints;
>>
>>     // internal constructor used for fluent API
>>     private TheOptimizer(..., int maxIter, List<Constraint> list) {
>>       ...
>>       this.maxIter     = m;
>>       this.constraints = l;
>>     }
>>
>>     public TheOptimizer withConstraint(Constraint c) {
>>       List<Constraint> l = new ArrayList<Constraint>(constraints);
>>       l.add(c);
>>       return new TheOptimizer(..., maxIter, l);
>>     }
>>
>>     public TheOptimizer withMaxIter(int maxIter m) {
>>       return new TheOptimizer(..., m, constraints);
>>     }
>>
>>  }
>>
>> So basically, the withXxx are sort-of setters, but they do preserve
>> immutability (we do not return this, we return a new object). It is easy
>> to add features to existing classes and there is no need to shove
>> everythin within a single hierarchy, we have a forest, not a tree. When
>> looking at the API, users clearly see what the can use and what they
>> cannot use: if an optimizer does not support constraint, there will be
>> no way to put a constraint into it. If in a later version constraints
>> become available, the existing functions will not be changed, only new
>> functions will appear.
>>
>> Of course, this creates a bunch of intermediate objects, but they are
>> often quite small and the setting part is not the most
>> computation-intensive one. It becomes also possible to do some
>> parametric studies on some features, using code like:
>>
>>   Algorithm core = new Algorithm().withA(a).withB(b).withC(c);
>>   for (double d = 0.0; d < 1.0; d += 0.001) {
>>     Algorithm dSpecial = core.withD(d);
>>     double result = dSpecial.run();
>>     System.out.println(" d = " + d + ", result = " + result);
>>   }
>>
>> This would work for someone considering feature A is a core feature that
>> should be fixed but feature D is a parameter, but this would equally
>> well work for someone considering the opposite case: they will simply
>> write the loop the other way, the call to withD being outside of the
>> loop and the call to withA being insided the loop.
>>
>> A side effect is also that it becomes possible to copy safely algorithms
>> by just resetting a feature, even when we don't really know what
>> implementation we have. A typical example I have that creates problems
>> to me is duplicating an ODE solver. It cannot be done currently, as some
>> specific elements are required at construction time that depend on the
>> exact type of solver you use (tolerance vectors for adaptive stepsize
>> integrators). So if for example I want to do some Monte-Carlo analysis
>> in parallel and need to duplicate an integrator,
>> I would do it as follows:
>>
>>   void FirstOrderIntegrator[]
>>       duplicate(FirstOrderIntegrator integrator, int n) {
>>     FirstOrderIntegrator copies = new FirstOrderIntegrator[n];
>>     for (int i = 0; i < n; ++i) {
>>       copies[i] =
>>         integrator.withMaxEvaluations(integrator.getMaxEvaluations());
>>     }
>>     return copies;
>>   }
>>
>> This kind of API could be extended to several algorithms, so it may be
>> set up in a consistend way accross the library. As I wrote at the
>> beginning of this message, I first think about root solvers, optimizers
>> and ODE.
>>
>> What do you think?
>> Luc
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message