uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: opinions, please, for a UV3 proposed change
Date Thu, 14 Sep 2017 14:36:41 GMT
I was mistaken about Java in one detail:  for things like Integer(17), there are
two ways to create it:  new Integer(17), or Integer.valueOf(17).  The first call
does create a fresh, not == to any other Integer object, while the 2nd call will
reuse an existing Integer object for 17 (if it exists).  Users are encouraged to
switch to Integer.valueOf(xxx) for efficiency in the Javadocs.

I'm now slightly leaning against doing this change for UIMA, because of the edge
cases where the user could have depended on object un-equality for 0-length
arrays and lists.

Users could "manually" achieve the same result using the shared instance values,
and (for xmi serialization) marking any features that contain these values as
"multi-references-allowed" so the deserialization would share them.  This could
become a suggested "best practice" for those who use 0-length arrays and empty
lists. 

Not doing this would make two Jiras a "won't fix":
https://issues.apache.org/jira/browse/UIMA-5564
https://issues.apache.org/jira/browse/UIMA-5566

What do others think?

-Marshall

On 9/13/2017 8:22 AM, Marshall Schor wrote:
> I posted a Jira for a proposed change in how 0-length UIMA arrays and lists are
> managed.  These are immutable objects, and (theoretically) one instance (per
> CAS) could be shared.
>
> In the current implementation, this is managed explicitly by the user - they can
> use a bunch of new APIs to get shared instances.
>
> I'm thinking a better way is to make this automatically the case, and remove the
> new bunch of APIs (a smaller API set is always a good thing, for essentially the
> same functionality, IMHO).  The implementation would change so that the calls
> that create "new" 0-length arrays/lists would instead of creating a new one,
> only do that if none already existed, and if one already did, it would return
> that one.
>
> This follows Java's general direction for immutable objects, like Strings and
> Integer values, which can be shared.
>
> For cases where people wanted/needed a CAS value "marker" that was tiny, but
> unique (like you get with Java's new Object()), we would keep "new TOP(aCas)" as
> something that generated unique instances.  What do others think?
>
> I've seen large-scale implementations of UIMA pipelines with lots of defaulted
> 0-length arrays in them; this has the potential to improve space/time
> performance a reasonable amount for these.
>
> -Marshall
>
>


Mime
View raw message