uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: opinion on degree of backwards compatibility for Uima V3 experiment
Date Fri, 02 Sep 2016 15:55:52 GMT

In v3, FSs are represented completely by Java class instances.  The indexes are
indexes whose values are FSs, directly. FSs which reference other FSs have
direct references to them, and don't use IDs.

The IDs are used for backwards compatibility - to support the CAS Low Level
APIs. This is an official UIMA public API (LowLevelCAS interface), and projects
who were trying to get higher performance sometimes made use of this.  With this
API, you could implement annotators that didn't create any Java objects, for
example.  This used to be important, 15 years ago, when Java was "new".

In V3, using the low level APIs is supported, but would actually be less
efficient than using the normal Java APIs. The low level APIs refer to FSs using
their ID's, which are "ints".  To make that work, those FSs which are created
with low level APIs are put into a map which maps the ID to the FS.

Re: Testing v3 -

remember, the v3 branch is not currently in a good state due to me being in the
middle of merge catching up from the recent flurry of changes to get out
v2.9.0.  Right now, for instance, one of the new test cases has uncovered a
missing part of the binary deserialization (delta) implementation in v3, and I'm
working on figuring out how to fix this.

There is an API in the V3 CASImpl, getFsFromId(int)  and also
getFsFromId_checked(int), which retrieves the associated FS, given the ID, or
returns null (or throws an exception) if it isn't in the table.  Most FSs
created normally, won't be in the table.

The recommended way to deal with this is to use (in Java) actual references to
the FSs, in place of the IDs, which is what the v3 framework does.

Hope that answers your question; if not, ask more :-)


On 9/2/2016 9:31 AM, Peter Klügl wrote:
> What does this mean?
> ID -> FS is not possible in v3, or only with low level API?
> Testing v3 and taking a closer look is still on my todo list, but I
> found not the time yet.
> Best,
> Peter
> Am 02.09.2016 um 15:18 schrieb Marshall Schor:
>> In v3, there are fast lookups FS -> ID :
>>    myFs._id()  // compiles to a fetch of a final int field in the FS object
>> To go from an ID to an FS is not generally possible, because normally, the
>> framework doesn't keep this association.  There are exceptions though, the main
>> ones being:
>> a) If you use low level CAS Apis to create FSs, the API returns the ID, which
>> means, that a GC that happens right after the API returns would garbage collect
>> the FS because at that point, nothing is "holding on" to any reference (it's not
>> in any index).  To prevent this, the low level create FS methods add the FS to a
>> map which goes from ID -> FS, and thus "holds onto" the FS, preventing Garbage
>> collection.
>> b) Another case where this happens is when PEARs are used; in this case the FSs
>> involved with PEAR "trampoline" FSs end up being in similar maps.
>> Both of these approaches of course disable a feature of V3 - namely, that
>> unrefererenced FSs can be garbage collected.
>> -Marshall
>> On 9/2/2016 8:47 AM, Richard Eckart de Castilho wrote:
>>> Fast lookups ID -> FS and FS -> ID would also be very much appreciated
>>> Cheers,
>>> -- Richard
>>>> On 02.09.2016, at 14:17, Burn Lewis <burnlewis@gmail.com> wrote:
>>>> Could the id assigned in V3 be the same as the V2 address, as if the offset
>>>> in a heap?  Unique and monotonically increasing.
>>>> Burn
>>>> On Fri, Sep 2, 2016 at 5:36 AM, Peter Klügl <peter.kluegl@averbis.com>
>>>> wrote:
>>>>> Same here.
>>>>> It looks like that we are now also starting to use the address, and I
>>>>> also thinking of using it more in Ruta (internal indexing).
>>>>> Btw, I did some simple experiments lately concerning the stability of
>>>>> the addresses when using CasIOUtils. Can it happens that the addresses
>>>>> change if you just deserialize the same CAs twice without serializing
>>>>> in between?

View raw message