uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <j+...@grivolla.net>
Subject Re: opinion on degree of backwards compatibility for Uima V3 experiment
Date Thu, 08 Sep 2016 20:29:38 GMT
Hi, at some point we would have wished for (stable) FS IDs to be able to
reference annotations, especially when trying to work outside of UIMA (and
possibly Java). For that we would actually have liked to have something
more geared towards users than the IDs that appear e.g. in XMIs, with clear
documentation of what those IDs represent and how to deal with them e.g.
when generating or modifying XMI outside of UIMA.

On the other hand, backwards compatibility with the V2 addresses is not a
concern for us at all.

Best,
Jens

On Thu, Sep 8, 2016 at 3:27 PM, Marshall Schor <msa@schor.com> wrote:

> It seems that some (but not all) users really like and make use of
>
> * int "id"s that are stable and don't change due to loading/saving
> * to get "direct access" to FSs using these "id"s
> * want UIMA framework support for this
>
> I state this based on a history over time of multiple discussions on
> various
> lists, about this topic.
>
> Up to now, these users have been using internal data in V2 (the "address"
> in the
> low level representation), which is stable for some load/save operations
> but not
> others.
>
> Supporting this costs two things:
> * space - in each FS, for the int "id" and
> * space/time to hold and update a map from "id" to the FS for direct
> access.
> This map would likely have "weak references" (an additional Java Object
> overhead
> per FS) to permit GC to work. (The use of weak refs could be an option, as
> well).
>
> We could support such a thing in V3 based on some pipeline setting (e.g.
> using
> additionalParameters options); this would permit freeing the use of
> internal
> id's etc., to be more just for internal use.
>
> Is this a reasonable description of this "use case"? Does it seem
> reasonable for
> V3 to support such a thing?
>
> -Marshall
>
>
> On 9/2/2016 1:56 PM, Richard Eckart de Castilho wrote:
> > See comment at end of mail.
> >
> > On 02.09.2016, at 15:18, Marshall Schor <msa@schor.com> wrote:
> >> To go from an ID to an FS is not generally possible, because normally,
> the
> >> framework doesn't keep this association.  There are exceptions though,
> the main
> >> ones being:
> >>
> >> a) If you use low level CAS Apis to create FSs, the API returns the ID,
> which
> >> means, that a GC that happens right after the API returns would garbage
> collect
> >> the FS because at that point, nothing is "holding on" to any reference
> (it's not
> >> in any index).  To prevent this, the low level create FS methods add
> the FS to a
> >> map which goes from ID -> FS, and thus "holds onto" the FS, preventing
> Garbage
> >> collection.
> >>
> >> b) Another case where this happens is when PEARs are used; in this case
> the FSs
> >> involved with PEAR "trampoline" FSs end up being in similar maps.
> >>
> >> Both of these approaches of course disable a feature of V3 - namely,
> that
> >> unrefererenced FSs can be garbage collected.
> >>
> >> ...
> >>
> >> There is an API in the V3 CASImpl, getFsFromId(int)  and also
> >> getFsFromId_checked(int), which retrieves the associated FS, given the
> ID, or
> >> returns null (or throws an exception) if it isn't in the table.  Most
> FSs
> >> created normally, won't be in the table.
> > Can we do this? -> As soon as an FS has been added to an index or is
> being referenced from another FS, its ID should be resolvable to the
> respective FS.
> >
> > When an FS is in an index or being referred by another FS, it cannot be
> garbage collected anyway. The CAS could maintain a lookup using weak
> references to provides a central place to look up such FSes via their IDs
> without preventing garbage collection.
> >
> > WebAnno remembers the ID of every FS rendered on screen. When the user
> makes an action, we load the CAS from disk and then look up the ID to
> retrieve the FS. We do not keep the CAS in memory all the time. If we would
> have to scan the whole CAS for the FS with a given ID, it would have
> probably a serious performance impact.
> >
> > Cheers,
> >
> > -- Richard
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message