uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burn Lewis <burnle...@gmail.com>
Subject Re: opinion on degree of backwards compatibility for Uima V3 experiment
Date Fri, 02 Sep 2016 12:17:18 GMT
Could the id assigned in V3 be the same as the V2 address, as if the offset
in a heap?  Unique and monotonically increasing.

Burn

On Fri, Sep 2, 2016 at 5:36 AM, Peter Kl├╝gl <peter.kluegl@averbis.com>
wrote:

> Same here.
>
>
> It looks like that we are now also starting to use the address, and I am
> also thinking of using it more in Ruta (internal indexing).
>
>
> Btw, I did some simple experiments lately concerning the stability of
> the addresses when using CasIOUtils. Can it happens that the addresses
> change if you just deserialize the same CAs twice without serializing it
> in between?
>
>
> Best,
>
>
> Peter
>
>
>
> Am 01.09.2016 um 19:29 schrieb Richard Eckart de Castilho:
> > FS IDs are IMHO a very useful thing. Providing out-of-band (i.e.
> out-of-type-system) unique identifiers for feature structures facilitates
> handling them in e.g. in editors. We use that quite a bit in WebAnno.
> >
> > In WebAnno, we do not rely on any heap arithmetics - an ID is just
> expected to be a unique identifier. However, I could imagine cases where
> people might rely on the ID to increment monotonically for new FSes.
> >
> > Most binary formats do not preserve the ID across a save/load cycle.
> However, SERIALIZED and SERIALIZED_TSI *do* preserve the ID, and WebAnno
> makes used of that. It allows to keep references to FSes without having to
> keep the CAS in memory all the time.
> >
> > There should continue to be a V3 serialization format which preserves
> IDs across a load/save cycle.
> >
> > I do presently not see a case where a strong similarity between V2 and
> V3 IDs would be important. It would be nice if deserializing a V2
> SERIALIZED or SERIALIZED_TSI into V3 would restore the V2 IDs - I expect it
> to be an easy thing to do.
> >
> > Cheers,
> >
> > -- Richard
> >
> >> On 01.09.2016, at 16:09, Marshall Schor <msa@schor.com> wrote:
> >>
> >> UIMA V3 implementation includes in many places extra code (takes time /
> space)
> >> whose goal is to make things look closer to version 2.  Some of this is
> for
> >> interoperability with version 2 artifacts, like serialized forms.
> >>
> >> An example: in v2, many serialization forms include "references" to
> other
> >> Feature Structures (FSs), and for those, the encoding is the "address"
> in the
> >> heap of the FS.
> >>
> >> In v3, there is no heap, but the FSs have "ids", which are (at the
> moment) an
> >> int which increments by 1.  This mis-matches the "address" in v2, so
> many parts
> >> of the serialization code builds a map at serialization time from the
> v3 id's to
> >> v2 "addresses", and uses the latter in the serialization form.
> >>
> >> Currently, this is done for various binary serializations, so that
> these can be
> >> read back in by v2 code.
> >>
> >> Currently, it's not done for JSON or XMI (and maybe XCAS - haven't
> checked).  So
> >> the serialized forms for these differ between v2 and v3, in that the
> numbers
> >> used to represent references to other FSs are different.
> >>
> >> The deserialization code for XMI and JSON doesn't depend on these
> numbers being
> >> anything other than unique per FS, so there's no issue in
> deserializing.  But
> >> the UIMA community may have built other things that depend on these
> identifiers
> >> not changing.
> >>
> >> What's your opinion: should the XMI and JSON etc serialization in V3 be
> changed
> >> to reproduce (approximately) the same reference numbers as v2?  I say
> >> approximately, because other factors might affect these, such as the
> ordering
> >> for things not in "ordered" indexes, etc. between v2 and v3.
> >>
> >> -Marshall
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message