uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: opinion on degree of backwards compatibility for Uima V3 experiment
Date Thu, 08 Sep 2016 13:27:41 GMT
It seems that some (but not all) users really like and make use of

* int "id"s that are stable and don't change due to loading/saving
* to get "direct access" to FSs using these "id"s
* want UIMA framework support for this

I state this based on a history over time of multiple discussions on various
lists, about this topic.

Up to now, these users have been using internal data in V2 (the "address" in the
low level representation), which is stable for some load/save operations but not

Supporting this costs two things:
* space - in each FS, for the int "id" and
* space/time to hold and update a map from "id" to the FS for direct access. 
This map would likely have "weak references" (an additional Java Object overhead
per FS) to permit GC to work. (The use of weak refs could be an option, as well).

We could support such a thing in V3 based on some pipeline setting (e.g. using
additionalParameters options); this would permit freeing the use of internal
id's etc., to be more just for internal use.

Is this a reasonable description of this "use case"? Does it seem reasonable for
V3 to support such a thing?


On 9/2/2016 1:56 PM, Richard Eckart de Castilho wrote:
> See comment at end of mail.
> On 02.09.2016, at 15:18, Marshall Schor <msa@schor.com> wrote:
>> To go from an ID to an FS is not generally possible, because normally, the
>> framework doesn't keep this association.  There are exceptions though, the main
>> ones being:
>> a) If you use low level CAS Apis to create FSs, the API returns the ID, which
>> means, that a GC that happens right after the API returns would garbage collect
>> the FS because at that point, nothing is "holding on" to any reference (it's not
>> in any index).  To prevent this, the low level create FS methods add the FS to a
>> map which goes from ID -> FS, and thus "holds onto" the FS, preventing Garbage
>> collection.
>> b) Another case where this happens is when PEARs are used; in this case the FSs
>> involved with PEAR "trampoline" FSs end up being in similar maps.
>> Both of these approaches of course disable a feature of V3 - namely, that
>> unrefererenced FSs can be garbage collected.
>> ...
>> There is an API in the V3 CASImpl, getFsFromId(int)  and also
>> getFsFromId_checked(int), which retrieves the associated FS, given the ID, or
>> returns null (or throws an exception) if it isn't in the table.  Most FSs
>> created normally, won't be in the table.
> Can we do this? -> As soon as an FS has been added to an index or is being referenced
from another FS, its ID should be resolvable to the respective FS.
> When an FS is in an index or being referred by another FS, it cannot be garbage collected
anyway. The CAS could maintain a lookup using weak references to provides a central place
to look up such FSes via their IDs without preventing garbage collection.
> WebAnno remembers the ID of every FS rendered on screen. When the user makes an action,
we load the CAS from disk and then look up the ID to retrieve the FS. We do not keep the CAS
in memory all the time. If we would have to scan the whole CAS for the FS with a given ID,
it would have probably a serious performance impact.
> Cheers,
> -- Richard

View raw message