uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: opinion on degree of backwards compatibility for Uima V3 experiment
Date Wed, 07 Sep 2016 13:45:18 GMT
Hi Jörn,

Thanks for your input.  Could you possible expand with a few specifics on what
changes you think would make it easier to use with Hadoop etc.?

-Marshall


On 9/7/2016 7:46 AM, Joern Kottmann wrote:
> Hello all,
>
> at my work place we use UIMA mostly with custom code to load data into a
> pipeline and store its results,
> therefore we don't depend at all on the UIMA serialization formats. And
> changing them, or adding new ones which
> are incompatible wouldn't be an issue at all. Also the existing code can be
> ported to work with UIMA 3.
>
> I really hope we can get UIMA 3 into a shape where it is easier to use with
> todays requirements (e.g. with Hadoop)
> and possibilities.
>
> I personally think that the effort to create the next overhauled version
> shouldn't be limited in anyway by backward compatibility.
> For me it is a good solution if there is some help with migrating things to
> UIMA 3 (e.g. a guide which explains what to do)
> and maybe maintaining UIMA 2 for a while in parallel (e.g. fixes of very
> urgent/critical bugs).
>
> Jörn
>
> On Fri, Sep 2, 2016 at 7:56 PM, Richard Eckart de Castilho <rec@apache.org>
> wrote:
>
>> See comment at end of mail.
>>
>> On 02.09.2016, at 15:18, Marshall Schor <msa@schor.com> wrote:
>>> To go from an ID to an FS is not generally possible, because normally,
>> the
>>> framework doesn't keep this association.  There are exceptions though,
>> the main
>>> ones being:
>>>
>>> a) If you use low level CAS Apis to create FSs, the API returns the ID,
>> which
>>> means, that a GC that happens right after the API returns would garbage
>> collect
>>> the FS because at that point, nothing is "holding on" to any reference
>> (it's not
>>> in any index).  To prevent this, the low level create FS methods add the
>> FS to a
>>> map which goes from ID -> FS, and thus "holds onto" the FS, preventing
>> Garbage
>>> collection.
>>>
>>> b) Another case where this happens is when PEARs are used; in this case
>> the FSs
>>> involved with PEAR "trampoline" FSs end up being in similar maps.
>>>
>>> Both of these approaches of course disable a feature of V3 - namely, that
>>> unrefererenced FSs can be garbage collected.
>>>
>>> ...
>>>
>>> There is an API in the V3 CASImpl, getFsFromId(int)  and also
>>> getFsFromId_checked(int), which retrieves the associated FS, given the
>> ID, or
>>> returns null (or throws an exception) if it isn't in the table.  Most FSs
>>> created normally, won't be in the table.
>> Can we do this? -> As soon as an FS has been added to an index or is being
>> referenced from another FS, its ID should be resolvable to the respective
>> FS.
>>
>> When an FS is in an index or being referred by another FS, it cannot be
>> garbage collected anyway. The CAS could maintain a lookup using weak
>> references to provides a central place to look up such FSes via their IDs
>> without preventing garbage collection.
>>
>> WebAnno remembers the ID of every FS rendered on screen. When the user
>> makes an action, we load the CAS from disk and then look up the ID to
>> retrieve the FS. We do not keep the CAS in memory all the time. If we would
>> have to scan the whole CAS for the FS with a given ID, it would have
>> probably a serious performance impact.
>>
>> Cheers,
>>
>> -- Richard


Mime
View raw message