uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hannes Korte <hannes.ko...@iais.fraunhofer.de>
Subject Possibly a bug in subiterator
Date Wed, 07 Apr 2010 11:57:30 GMT
Hi,

I noticed a strange behavior of the annotation index subiterator in
uimaj 2.2.2 and 2.3.0.

Consider the sentence: 'Testing the UIMA-Framework'
with tokens: 'Testing' 'the' 'UIMA-Framework'
and the named entity: 'UIMA'

The type priorities list NamedEntity on top of the Token type.

If I call the Token subiterator for the NamedEntity 'UIMA' with
strict=false, I get an empty result. According to the docs, the
definition of Tokens contained in the NamendEntity is in the
strict=false setting defined as:

  annot.getBegin() <= b.getBegin() <= annot.getEnd()

for NamedEntity annot and Token b. This is true for 'UIMA' and
'UIMA-Framework', but the subiterator is empty.

If I change the NamedEntity to ' UIMA' (including the preceeding space),
then it works correctly, and the Token 'UIMA-Framework' is contained in
the subiterator.

I appended a simple java class with all needed files to demonstrate the
problem. Any ideas?

Best regards,
Hannes




Mime
View raw message