uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: small memory footprint tradeoff configuration
Date Wed, 11 Mar 2009 12:53:20 GMT

Thilo Goetz wrote:
> Marshall Schor wrote:
> [...]
>> I agree that backward compatibility is important and is an issue.  To
>> help the transition to this new scheme, I think an overall global switch
>> is needed (similar to the switches we have for JCas "interning") that
>> would by default make things work the way they do now.  A user
>> interested in small-footprint operation (and in trading off some
>> additional processing cycles to achieve it) would enable this switch.
>> To help it "work" - we would allow things to continue to operation which
>> "set" a non-stored feature - theset would just become no-ops.  Then if
>> the annotator wasn't paying attention to ResultSpecification, and tried
>> to set features that were not used, it would still work. 
>> On the other end, if an annotator actually made use of a particular
>> feature, but didn't specify it in its "input capability specification",
>> that would fail with this scheme.  The failure would be some kind of
>> Java exception, which would probably be noticed.  To recover, a user of
>> such a component would modify the input capability specification to
>> indicate that that feature was needed. 
> If a feature is defined in the type system, it should be there
> for the annotator writer to use.  Who are we to know how people
> will use those features?
>> As I write this, I notice that the input capability specification for a
>> primitive annotator doesn't quite fit the meaning hear - because I think
>> it means that this annotator needs that feature upon input - and this
>> edge case - where the annotator itself produces this feature, and then
>> also uses it - is not part of that definition. We could either expand
>> the meaning here to include this edge case, or (possibly a better
>> option) introduce, explicitly, another piece of metadata indicating that
>> a particular type/field was both created and used by this one primitive
>> annotator.  A third option could be to store these "unused" features if
>> set (in some out-of-line temporary storage) for the duration of the
>> running of a particular annotator, just in case these were "used" by the
>> same annotator, and then discard that extra storage after the annotator
>> exits.  This would be a big (but temporary) storage hit, though, so I
>> don't think I would want to do this.
> I vote we don't make things even more complicated than they
> already are, and educate those people who need a performance
> boost.
I agree in general about not making things more complicated at least to
the user.  I can imagine education working for
  1) things like string interning
  2) things like deleting features from type systems where they're not
being used, and where the annotator producing them will respect this.

What this approach seems to miss are the following kinds of things:

1) cases where some set of annotators produce feature structures, which,
after some point, are no longer needed, and are "deleted" but
never-the-less continue to consume space.

2) cases where some set of annotators produce feature structures having
lots of fields, where, after some point, the fields are no longer needed.

If these are not significant use-cases in practice, then I'm happy to
think-about / work-on other things :-).

> --Thilo

View raw message