uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: [jira] Closed: (UIMA-1352) java.lang.ClassCastException using find() with a SET index
Date Mon, 27 Jul 2009 18:19:58 GMT


Thilo Goetz wrote:
> Marshall Schor wrote:
>   
>> Thilo Goetz wrote:
>>     
>>> Marshall Schor wrote:
>>>   
>>>       
>>>> Thilo Goetz wrote:
>>>>     
>>>>         
>>>>> See the Jira issue for the cause of the problem.  More
>>>>> comments below.
>>>>>
>>>>> Marshall Schor wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> So, there may be 2 things to look at here - the actual error, described
>>>>>> above, and the more philosophical question on the behavior of moveTo
-
>>>>>> this seems to require a sorting order if the item "moved to" is not
>>>>>> present in the index.  Perhaps this needs to be documented better.
 And
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> I'm not sure I understand your point about moveTo().  It requires the
>>>>> index to be sorted to make any sense (and the BagIndex moveTo() is broken,
>>>>> but that's a different issue
>>>>>       
>>>>>           
>>>> Will you be fixing this too?
>>>>     
>>>>         
>>> We enter the realm of philosophy again.  What's the right
>>> behavior for moveTo() when the underlying index isn't sorted?
>>> In particular, what should happen when no proper element
>>> is found?  The javadocs say:
>>>
>>> Note that any operation like find() or FSIterator.moveTo() will not produce
>>> useful results on bag indexes, since bag indexes do not honor comparators. Only
>>> use a bag index if you want very fast adding and will have to iterate over the
>>> whole index anyway.
>>>   
>>>       
>> I like systems where user errors are reported :-).  If find() and
>> moveTo() don't work on bag indexes, I would prefer they throw an
>> exception, perhaps like UnsupportedOperationException or our equivalent
>> in UIMA.
>>     
>
> Fine with me.
>
>   
>>>   
>>>       
>>>>> ).  moveTo(fs) will position the iterator such
>>>>> that any element "to the left" is smaller than fs, and all elements at
the
>>>>> moved-to position and "to the right" of it are greater than or equal
to
>>>>> fs.  It doesn't matter if the item "moved to" is in the index or not.
>>>>> Remember that equality here is defined with respect to the sort order
of
>>>>> the index, it is not feature structure identity.  
>>>>>       
>>>>>           
>>>> Yes, this is something that is unexpected (to me), and I did forget this.

>>>>     
>>>>         
>>>>> All this is documented,
>>>>> but maybe not as clearly as it could be.
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>> what if no sorting order was defined for the set index?
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> Every set index has a sort order.  
>>>>>       
>>>>>           
>>>> This is the part that seems confusing, because our docs say that set
>>>> indexes do not enforce ordering, and the common definition for Sets does
>>>>     
>>>>         
>>> Where did you find that? The javadocs say that set indexes are
>>> not guaranteed to be sorted.  That's different from saying there's
>>> no ordering relation on the members.  How else would we determine
>>> equality?
>>>   
>>>       
>> Just by testing the key values for equality, not for order.
>>     
>
> Equality here is a notion derived from the partial order
> defined on the index.  You could define equality separately,
> but that would mean introducing a new notion into the index
> definitions.  I don't think we want that, or at least I don't.
>   
I agree we don't want to introduce a new notion of equality for index
definitions at this point.
>   
>>> Maybe we should remove this text, because at this time, set indexes
>>> are sorted, and that's not likely to change (I was thinking of hash
>>> based sets when I wrote that; still, you'll need a notion of equality,
>>> no matter how you implement your sets, yet they don't need to be
>>> sorted).
>>>
>>>   
>>>       
>>>> not have an ordering concept.  Yet our docs say that the sort order for
>>>> sets is used to determine "equality" among candidates in the set:  from
>>>> section 2.4.1.7:
>>>>
>>>> An index may define one or more /keys/. These keys determine the sort
>>>> order of the feature structures within a sorted index, and determine
>>>> equality for set indexes.
>>>>     
>>>>         
>>> That is incorrect.  It should say "0 or more keys".  Though if we should
>>> alert users to this fact if even UIMA developers have trouble with this
>>> is doubtful.
>>>
>>>   
>>>       
>> I think some of our users could be better at remembering these details
>> than I am :-)  I think this should be fixed - it's just a typo IMHO.
>>     
>>>> Perhaps this should also say something about the use of the sort order
>>>> in "moveTo(fs)" for sets?
>>>>     
>>>>         
>>> In our current implementation, set indexes are sorted indexes
>>> without the duplicates (duplicates with respect to the ordering
>>> relation of that index, of course).  If we commit to this and
>>> stop waffling about how set indexes may not be sorted, then we
>>> can just say that sorted and set indexes behave the same way.
>>>   
>>>       
>> My preference is to keep the original definitions - leaving (perhaps
>> unrealistically small) room for alternative implementations in the future.
>>     
>
> Sure, but how do you propose we improve the documentation, then?
>
>   
I'll take a crack at doing this

-Marshall
>>>   
>>>       
>>>>> If that sort order is empty, it means
>>>>> that all FSs are equal for that index.  That in turn means that this
>>>>> index will contain at most 1 FS at any time.  It also means that moveTo()
>>>>> will always position the iterator at that one element, if it exists.
>>>>>
>>>>> Did that help at all?
>>>>>   
>>>>>       
>>>>>           
>>>> Yes, thanks for the clarifications.
>>>>
>>>> -Marshall
>>>>     
>>>>         
>>>>> --Thilo
>>>>>
>>>>>
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>   
>>>       
>
>
>   

Mime
View raw message