uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Generification of FSIndex
Date Tue, 11 Aug 2009 05:59:50 GMT
Adam Lally wrote:
> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor<msa@schor.com> wrote:
>> Adam Lally wrote:
>>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann<kottmann@gmail.com> wrote:
>>>> Marshall Schor wrote:
>>>>> The generification of FSIndex currently specifies one type, <T extends
>>>>> FeatureStructure> that is the type of item being returned.
>>>>> The contains and find methods have arguments of type FeatureStructure.
>>>>> These could be changed to take type "T".
>>>> No I do not think that they could be changed to take type T.
>>>> Lets take the case of the contains method.
>>>> The javadoc says:
>>>> "Check if the index contains an element equal to the given feature structure
>>>> according to the
>>>> ordering of the index. Note that this is in general not the same as feature
>>>> structure identity."
>>>> and it for the param fs it says "The FS we're looking for.". There is no
>>>> place where
>>>> it says that contains can only be called for FSes which have the type of
>>>> index.
>>>> The change of the parameter from FeatureStructure to T would also change
>>>> the contract of the method a little, because then it would not be possible
>>>> anymore
>>>> to pass a FeatureStructure which has not type T.
>>> I agree.  It's sometimes useful to call FSIterator.moveTo method and
>>> pass an FS of a Type other than the one that the index was defined
>>> over, as part of implementing something like a subiterator.
>> I agree with the case where it's the "bag" index being used, because
>> that uses a test which works on all feature structures.
>> However, for Set and Sorted, the implication of passing a FS which is
>> not in the type hierarchy is, according to the JavaDocs, "undefined".
>> This is because the code assumes the layout of the features and their
>> values is appropriate for the type.  In other words, if the type of some
>> key was a string, it might take a value from the main int heap and use
>> it as an index into the string array - and if the int heap object was
>> not the right type, it could pull an arbitrary value from that slot, and
>> end up throwing an array index out of bounds exception.  When I looked,
>> it didn't appear to me that the code checked for any kind of type
>> subsumption before proceeding... (but I may have missed it...)
>> It could turn out that the data (whatever is being pulled) would just
>> happen to "match", even though the types are different.
>> Because this is an undefined operation that could throw various kinds of
>> runtime errors, or return an equal match where none really exists, I
>> think it should not be allowed, for set and sorted indexes.
> In the case of the AnnotationIndex, the object that you pass to
> FSIterator.moveTo must be a subtype of Annotation (else you would get
> all the weird effects that you describe).  But it is still valid for a
> user to do:
> FSIndex<AnnotType1> index = cas.getAnnotationIndex(annotType1);
> index.moveTo(annotType2);
> where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation.
> In general, the object that you pass to moveTo() must be a subtype of
> the type that was in the index definition (in the user's descriptor,
> or for the case of the built-in AnnotationIndex,
> uima.tcas.Annotation).
>   -Adam

One concrete example of Adam's point: suppose you have a sentenceFS
and a tokenIterator.  Then tokenIterator.moveTo(sentenceFS) will
position the token iterator at the first word in the sentence (modulo
some subtleties that are beside the point here).  Very useful.


View raw message