uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Lally" <ala...@alum.rpi.edu>
Subject Default Result Specifications too complicated?
Date Mon, 16 Apr 2007 19:58:52 GMT
I'm interested in getting others' opinions on this.  I was recently
helping some users who were having a problem where a 3rd-party
annotator they were using wasn't producing annotations that they
expected it to.  The annotator was embedded in a nested aggregate.  It
took me a couple of hours to figure out why it wasn't working (and if
they hadn't asked a uima developer, they might still be looking for

The reason was that this particular annotator made use of the
ResultSpecification (list of types/features that it should produce)
and was "optimizing" by not producing annotations not listed in the
Result Spec.  In this case, there was a downstream annotator that
incorrectly ommitted the type in question from its input capabilities.
 This makes the framework conclude that this type is not necessary, so
it won't be included in the Result Spec. (see below for more details
on how this works).

I think the main problem here is that most users have ignored the
Result Specification feature (we even encourage this by suggesting in
our docs that it's only for optimization), and they get very little
other feedback about whether they have set their input/output types.
So they are totally unprepared to debug something like this.

One possible solution is to turn off this Result Spec stuff by
default, and provide a global switch (in the
PerformanceTuningProperties) to turn it back on.  That way most users
can _safely_ ignore Result Specs, and more savvy users who turn them
on to get the best performance would presumably be more equipped to
debug the problems that might result.

Also we could start giving more feedback about incorrect input/output
capabilities, although it's not totally clear what the best way to do
that is.  It would not be good for performance to actually enforce
these during actually processing.

Any thoughts?

P.S. Here are the specific rules for the Result Spec (this is
documented in the manual more or less in this form):

The default Result Spec is automatically computed from the
capabilities in the component descriptors, as follows:

1) The outermost aggregate's result spec is set to the list of its
declared output types.
2) The result spec for each delegate is set to the union of the
aggregate's result spec with the set of all input types of all other
delegates in the aggregate.  (This is so that we ask each annotator to
produce types that may be needed by a subsequent annotator.  This rule
is applied independent of the order of the flow, so as to be
completely general in the case of a custom flow controller.)
3) For a nested aggregate, apply rule #2 recursively.

I think these rules make logical sense, and I can't think of any
easier rules to apply other than to forget the whole thing.

View raw message