uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Generics in 2.8.0 getAllIndexedFS
Date Wed, 08 Jul 2015 15:11:59 GMT
I think that similar arguments apply to other getters, such as

These are currently defined to be generic methods so they do type inferencing.


On 7/8/2015 9:24 AM, Marshall Schor wrote:
> More about the signatures and type inference.
> We have the following cases:
>   (maybe) not JCas, using CAS APIs: 
>       (maybe because a JCas user might get a CAS - not a JCas - in some routine)
>     (no arguments in getAllIndexedFS)
>       FSIterator<...> getAllIndexedFS();
>     (type argument in getAllIndexedFS) 
>       FSIterator<...> getAllIndexedFS(aType);
>   using JCas APIs:
>     (no arguments in getAllIndexedFS)
>       FSIterator<...> getAllIndexedFS();
>     (type argument in getAllIndexedFS) 
>       FSIterator<...> getAllIndexedFS(aType);
>       FSIterator<...> getAllIndexedFS(Class<Foo> clazz)
> For the getAllIndexedFS() (no argument) kinds of calls, I think there's agreement to
use the generic FeatureStructure for the CAS APIs, and TOP for the JCas APIs.
> When the getAllIndexedFS is given type arguments, the method returns an iterator over
that type and its subtypes.  Here it seems best to use the JCas type corresponding to the
type argument.  This is easy to do in the last case, above.  It can be "allowed" if the other
calls use generic method forms and pick up the type from the surrounding context.
> The "pro" for doing this is that it makes UIMA more coder-friendly, by not requiring
the coder to "cast" the result.
> The "con" for doing this is that it allows the coder to make a mistake (specifying the
wrong type).  This would only be caught at run time.
> If type inferencing from the surrounding context wasn't done, and the user needed to
cast the result, the user would be exposed to the same runtime error.  So, unless there's
some other pros/cons, it seems to me it would be best to allow generic type inferencing in
cases where there's a type specified (by any means) in the getAllIndexedFS method call.
> Did I miss something?
> -Marshall
> On 7/7/2015 4:19 PM, Richard Eckart de Castilho wrote:
>> On 07.07.2015, at 21:37, Marshall Schor <msa@schor.com> wrote:
>>> Burn reports a source incompatibility with the 2.8.0 signature for the
>>> getAllIndexedFS.  This is a somewhat complex issue.  Here's what I've found so
>>> 1) We have 3 sets of versions of getAllIndexedFS - one in the plain CAS index
>>> repo, one in the JCas version of the index repo, and one directly in the JCas
>>> itself.  The signatures are not consistent.
>>> 2) https://issues.apache.org/jira/browse/UIMA-4299 strongly suggests not
>>> returning values of <? extends XXX>.  We opted in the plain CAS version
to use
>>>   <T extends FeatureStructure> FSIterator<T> getAllIndexedFS(...)
>>> and in the JCas index version:
>>>                                FSIterator<FeatureStructure> getAllIndexedFS(...),
>>> and in the JCas itself:
>>>   <T extends TOP>              FSIterator<T> getAllIndexedFS(Class<T>
>> IMHO the signatures should be as follows:
>> FSIndexRepository:  FSIterator<FeatureStructure> getAllIndexedFS(...);
>> JFSIndexRepository: FSIterator<TOP> getAllIndexedFS(...)
>> JCas:               <T extends TOP> FSIterator<T> getAllIndexedFS(Class<T>
>> The first two signatures do not allow type inference. When I suggested to use
>> a signature that allowed type inference, Marshall pointed out it could lead to
>> undetected type errors. I find this is a valid objection and would therefore
>> prefer the versions that do not allow type inference.
>> The third signature allows for an inferred return type, but it is bound to the
>> clazz argument which allows a sound type checking. A method with this signature
>> could probably also be defined in the JFSIndexRepository.
>>> 3) At first I thought that it was quite possible that the getAllIndexedFS call
>>> will produce an iterator which returns a mixture of subtypes of the specified
>>> class, or instances of FeatureStructure, which are not a subtype of the
>>> specified class, and would produce a ClassCastException when assigned to a
>>> variable of the iterator type.  I thought that FeatureStructure will be produced
>>> if there is no JCas cover class, but I think this is incorrect.
>>> There are generators for each type that generate the corresponding Java class.

>>> As part of the JCas creation, the method instantiateJCas_Types is called and
>>> will replace the generators with JCas versions.  If no JCas cover class can be
>>> found for a particular type, the generator is still replaced with one for the
>>> first JCas cover type found in the supertype chain.  Since TOP is a built-in
>>> defined JCas type, even if no other types have JCas cover types, TOP will.
>> I have never seen the JCas produce a mix. TOP should be a safe assumption here.
>> The only case I think where I have seen a JCas return FeatureStructureImpl was
>> when it was not properly initialized - may happen after some forms of deserialization.
>> Calling jcas.getCas().getJCas() typically fixes this.
>>> 4) Current user code often has some JCas cover class as the top-most class of
>>> some set of Java cover objects being returned.  These are currently failing for
>>> some forms of getAllIndexedFS calls - the ones returning
>>> FSIterator<FeatureStructure>  These should be changed (back) to returning
>>> <T Extends TOP> or just TOP.
>> I'd vote for just TOP.
>>> 5) I'd like to use the <T extends TOP> form, but would like other Java
>>> to weigh in to see if that would cause issues...  To be precise, I'd like the
>>> signature to look like:
>>>   <T extends TOP>  FSIterator<T> getAllIndexedFS(...) for those calls
>>> don't specify the Java class in the argument.
>>> The pros for this is that the coder can specify a subtype of TOP if they know
>>> that will be returned.  The con for this is that the coder could make a mistake,
>>> and a runtime class-cast exception could happen, if the item returned was not
>>> subtype.
>> I'd still vote for just TOP in cases where no Java class is in the argument - 
>> and for introducing signatures that have a Java class in the argument ;)
>> Btw. uimaFIT uses TOP in cases where no Java class is specified and <T extends
>> if one is specified (and I hope it does it consistently).
>> When using CAS instead of JCas, uimaFIT typically offers a "plain" method, e.g.
>> CasUtil.selectAll() which returns Collection<AnnotationFS> and a "FS" method
>> CasUtil.selectAllFS() which returns Collection<FeatureStructure>. The assumption
>> here is that annotations are more common.
>> Cheers,
>> -- Richard

View raw message