uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Generification of FSIndex
Date Tue, 11 Aug 2009 12:33:21 GMT


Marshall Schor wrote:
> 
> Thilo Goetz wrote:
>> Marshall Schor wrote:
>>   
>>> Thilo Goetz wrote:
>>>     
>>>> Adam Lally wrote:
>>>>   
>>>>       
>>>>> On Mon, Aug 10, 2009 at 5:32 PM, Marshall Schor<msa@schor.com>
wrote:
>>>>>     
>>>>>         
>>>>>> Adam Lally wrote:
>>>>>>       
>>>>>>           
>>>>>>> On Mon, Aug 10, 2009 at 4:07 PM, Jörn Kottmann<kottmann@gmail.com>
wrote:
>>>>>>>
>>>>>>>         
>>>>>>>             
>>>>>>>> Marshall Schor wrote:
>>>>>>>>
>>>>>>>>           
>>>>>>>>               
>>>>>>>>> The generification of FSIndex currently specifies one
type, <T extends
>>>>>>>>> FeatureStructure> that is the type of item being returned.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The contains and find methods have arguments of type
FeatureStructure.
>>>>>>>>> These could be changed to take type "T".
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>                 
>>>>>>>> No I do not think that they could be changed to take type
T.
>>>>>>>> Lets take the case of the contains method.
>>>>>>>> The javadoc says:
>>>>>>>> "Check if the index contains an element equal to the given
feature structure
>>>>>>>> according to the
>>>>>>>> ordering of the index. Note that this is in general not the
same as feature
>>>>>>>> structure identity."
>>>>>>>> and it for the param fs it says "The FS we're looking for.".
There is no
>>>>>>>> place where
>>>>>>>> it says that contains can only be called for FSes which have
the type of the
>>>>>>>> index.
>>>>>>>>
>>>>>>>> The change of the parameter from FeatureStructure to T would
also change
>>>>>>>> the contract of the method a little, because then it would
not be possible
>>>>>>>> anymore
>>>>>>>> to pass a FeatureStructure which has not type T.
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>               
>>>>>>> I agree.  It's sometimes useful to call FSIterator.moveTo method
and
>>>>>>> pass an FS of a Type other than the one that the index was defined
>>>>>>> over, as part of implementing something like a subiterator.
>>>>>>>
>>>>>>>         
>>>>>>>             
>>>>>> I agree with the case where it's the "bag" index being used, because
>>>>>> that uses a test which works on all feature structures.
>>>>>>
>>>>>> However, for Set and Sorted, the implication of passing a FS which
is
>>>>>> not in the type hierarchy is, according to the JavaDocs, "undefined".
>>>>>> This is because the code assumes the layout of the features and their
>>>>>> values is appropriate for the type.  In other words, if the type
of some
>>>>>> key was a string, it might take a value from the main int heap and
use
>>>>>> it as an index into the string array - and if the int heap object
was
>>>>>> not the right type, it could pull an arbitrary value from that slot,
and
>>>>>> end up throwing an array index out of bounds exception.  When I looked,
>>>>>> it didn't appear to me that the code checked for any kind of type
>>>>>> subsumption before proceeding... (but I may have missed it...)
>>>>>>
>>>>>> It could turn out that the data (whatever is being pulled) would
just
>>>>>> happen to "match", even though the types are different.
>>>>>>
>>>>>> Because this is an undefined operation that could throw various kinds
of
>>>>>> runtime errors, or return an equal match where none really exists,
I
>>>>>> think it should not be allowed, for set and sorted indexes.
>>>>>>
>>>>>>       
>>>>>>           
>>>>> In the case of the AnnotationIndex, the object that you pass to
>>>>> FSIterator.moveTo must be a subtype of Annotation (else you would get
>>>>> all the weird effects that you describe).  But it is still valid for
a
>>>>> user to do:
>>>>> FSIndex<AnnotType1> index = cas.getAnnotationIndex(annotType1);
>>>>> index.moveTo(annotType2);
>>>>>
>>>>> where annotType1 and annotType2 are both subtypes of uima.tcas.Annotation.
>>>>>
>>>>> In general, the object that you pass to moveTo() must be a subtype of
>>>>> the type that was in the index definition (in the user's descriptor,
>>>>> or for the case of the built-in AnnotationIndex,
>>>>> uima.tcas.Annotation).
>>>>>
>>>>>   -Adam
>>>>>     
>>>>>         
>>>> One concrete example of Adam's point: suppose you have a sentenceFS
>>>> and a tokenIterator.  Then tokenIterator.moveTo(sentenceFS) will
>>>> position the token iterator at the first word in the sentence (modulo
>>>> some subtleties that are beside the point here).  Very useful.
>>>>   
>>>>       
>>> Yes.  This works because the "index" being iterated over is the general
>>> annotation index (the one that's built-it) - and the presumeption is
>>> that "token" and "sentenceFS" are both subtypes of AnnotationFS.
>>>
>>> Is there any reason *not* to add a check to the various methods in
>>> indexing that take one or more Feature Structures, to see if they are
>>> being passed a subtype of the type being indexed (except for bag indexes)?
>>>     
>> Sure.  Suppose B < A, and we have an index on B, but the ordering
>> relation is defined in terms of features that B inherits from
>>            A.
>> Then I can use an A to position iterators on the B index.  This
>> works now, and I don't see why we should prohibit it.
>>   
> 
> Right, I knew I was missing some detail.  So the test for validity is
> not a simple subsumes test, but rather a check to see if the feature
> structure slots used in the keys used for testing by the index are in
> the same position in the feature structure layout as the type being
> passed in.  Correct? 

Essentially, yes.

> 
> If so, is there any reason not to add this as a check?

No reason other than that the relevant information is
thrown away after the index has been generated, and it
would be a medium size effort to implement.  I haven't
looked at this in a while, but the distributed nature
of the index implementation (one index object for every
type subsumed by the index type) is not going to make
this any easier.  Or maybe they share that part, I don't
recall.

> 
> -Marshall
> 
> 
>>   
>>> -Marshall
>>>
>>>
>>>     
>>>> --Thilo
>>>>
>>>>
>>>>
>>>>   
>>>>       
>>
>>   

Mime
View raw message