uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: opinions, please, for a UV3 proposed change
Date Thu, 14 Sep 2017 15:15:04 GMT
+0

I do not have a strong opinion here. As far as I now, we do not use
(many) empty StringArrays (or others) but rather null values for the
features.

Personally, I would prefer sharing as I see no real need for 0-length
arrays used as markers.


Peter

Am 14.09.2017 um 16:36 schrieb Marshall Schor:
> I was mistaken about Java in one detail:  for things like Integer(17), there are
> two ways to create it:  new Integer(17), or Integer.valueOf(17).  The first call
> does create a fresh, not == to any other Integer object, while the 2nd call will
> reuse an existing Integer object for 17 (if it exists).  Users are encouraged to
> switch to Integer.valueOf(xxx) for efficiency in the Javadocs.
>
> I'm now slightly leaning against doing this change for UIMA, because of the edge
> cases where the user could have depended on object un-equality for 0-length
> arrays and lists.
>
> Users could "manually" achieve the same result using the shared instance values,
> and (for xmi serialization) marking any features that contain these values as
> "multi-references-allowed" so the deserialization would share them.  This could
> become a suggested "best practice" for those who use 0-length arrays and empty
> lists. 
>
> Not doing this would make two Jiras a "won't fix":
> https://issues.apache.org/jira/browse/UIMA-5564
> https://issues.apache.org/jira/browse/UIMA-5566
>
> What do others think?
>
> -Marshall
>
> On 9/13/2017 8:22 AM, Marshall Schor wrote:
>> I posted a Jira for a proposed change in how 0-length UIMA arrays and lists are
>> managed.  These are immutable objects, and (theoretically) one instance (per
>> CAS) could be shared.
>>
>> In the current implementation, this is managed explicitly by the user - they can
>> use a bunch of new APIs to get shared instances.
>>
>> I'm thinking a better way is to make this automatically the case, and remove the
>> new bunch of APIs (a smaller API set is always a good thing, for essentially the
>> same functionality, IMHO).  The implementation would change so that the calls
>> that create "new" 0-length arrays/lists would instead of creating a new one,
>> only do that if none already existed, and if one already did, it would return
>> that one.
>>
>> This follows Java's general direction for immutable objects, like Strings and
>> Integer values, which can be shared.
>>
>> For cases where people wanted/needed a CAS value "marker" that was tiny, but
>> unique (like you get with Java's new Object()), we would keep "new TOP(aCas)" as
>> something that generated unique instances.  What do others think?
>>
>> I've seen large-scale implementations of UIMA pipelines with lots of defaulted
>> 0-length arrays in them; this has the potential to improve space/time
>> performance a reasonable amount for these.
>>
>> -Marshall
>>
>>


Mime
View raw message