uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Re: Thoughts on extending FlowController API
Date Thu, 01 Mar 2007 15:00:54 GMT
On 2/28/07, Michael Baessler <mba@michael-baessler.de> wrote:
> How does it work with the additionalParams map to configure my
> application to 'continue'
> or 'terminate' in case of errors. Will it be configurable for each
> analysis engine separately?
> I think it would be very useful since the error handling depends on the
> analysis engine. So when using the additionalParams map, does the
> application
> have to take care how to get the configuration or will that be part of
> any of the common descriptors?

The FlowController could decide based on configuration (see below)
whether to continue or terminate based on which Analysis Engine failed
(some might be more imporant to the end-result than others).

I intended the additionalParams suggestion just to be a global switch
to cause an abort on _any_ error, just in case a deployer wanted to
override the flow controller's decision in that way.  (I'm not sure
this is a worthwhile thing to do, it was just an idea.)  Of course
there are many more possible kinds of error handling configuration
settings that the user might want to specify, but I don't want to get
into how to specify them all in the aggregate descriptor.

> I think a good place to specify this will the flowConstraints section in
> an aggregate descriptor.
>
> When having a build-in flow, it can look like:
>     <flowConstraints>
>       <fixedFlow>
>         <node errorAction="continue" >ae1</node>
>         <node errorAction="terminate">ae2</node>
>       </fixedFlow>
>     </flowConstraints>
>

Yes, we could consider extending the <fixedFlow> in this way.  That
would let people who are using the existing FixedFlowController easily
configure whether to continue or terminate.  The default would be
terminate, so maybe the attribute should be a boolean
continueOnError="true" in order to override the default.

While we're on that topic, since I added a ParallelStep that the Flow
Controller can return, I wonder if we also want to extend <fixedFlow>
to allow including a parallel step.  So something like:
      <fixedFlow>
        <node errorAction="continue" >ae1</node>
        <parallel>
          <node errorAction="terminate">ae2</node>
          <node errorAction="continue">ae3</node>
        </parallel>
      </fixedFlow>

If we don't do this then people who want to configure a parallel flow
would need a custom flow controller, which seems a little bit like
overkill.

A concern is that we'd be adding complexity to what used to be a very
simple concept for the <fixedFlow>, but I think we can hide this from
most users until they start to care about more complex flow options.

Changes to the definition of FixedFlow would require CDE support, though.


> but when having a FlowController plugged in, this section is missing.

Actually it is possible, but not required, to have a <fixedFlow>
section when using a custom FlowController.  (I think the CDE supports
this too, but I'm not sure.)

> But I wonder why. I think for these flows, the order of the
> analysis engines can also be relevant. How does this work currently? I
> think the order of the analysis engine definition is used, right?

The reason it's optional is that a custom FlowController often
wouldn't use a fixed ordering of AnalysisEngines - it may make dynamic
flow decisions based on other criteria.

Note that FlowControllers can define configuration parameters just
like AEs can, so whatever information the FlowController needs to make
routing decisions can be provided that way, if it can't be represented
by the <fixedFlow> object.

The ordering of analysis engine definitions can't be used to make flow
decisions.  These are put into a HashMap and the ordering is lost
before it gets to the FlowController.

So in summary I think I like the idea of extending <fixedFlow> to
support two other things:
1) replace a <node> with a parallel step that is a collection of
<node>s that could be run in parallel.
2) for each <node> has an optional boolean attribute continueOnError,
which defaults to false.  If set to true, in the case of an error in
this AE, processing will continue on to the next element of the flow.

-Adam

Mime
View raw message