uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Eckart de Castilho (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-5106) uv3 constant "id" for FSs (Proposed new Feature for uv3)
Date Sat, 17 Sep 2016 07:17:20 GMT

    [ https://issues.apache.org/jira/browse/UIMA-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15498418#comment-15498418

Richard Eckart de Castilho commented on UIMA-5106:

I understand, thanks.

Regarding the preservation of IDs across serialization: this is very useful. 

Maybe it should not always be mandatory. A user may intentionally want to "garbage collect"
the ID space. E.g. right now with v2, I use a variant of SERIALIZED if I want to preserve
IDs and COMPRESSED_FILTERED if I wanted to garbage-collect IDs (and FSes). I could imagine
that with v3, the preservation of IDs could become a parameter to some serialization/deserialization

> uv3 constant "id" for FSs (Proposed new Feature for uv3)
> --------------------------------------------------------
>                 Key: UIMA-5106
>                 URL: https://issues.apache.org/jira/browse/UIMA-5106
>             Project: UIMA
>          Issue Type: New Feature
>          Components: Core Java Framework
>            Reporter: Marshall Schor
>            Priority: Minor
>             Fix For: 3.0.0SDKexp
> Add constant ID for FSs. This would be an incrementing, long value. It would be constant
through serialization/ deserialization cycles. There would be a lazily created map from longs
to FSs (via weak links) to allow direct access from the ID to the FS.  Lazy intent is to not
have a cost for this (space/time) other than the cost for 1 long / FS, if it is not used.
> We could make this feature optional, as well, to avoid the 8 bytes per FS overhead, but
in V3, I think that's not a good tradeoff (space savings vs complexity).  
> Issues: 
> * Current design allows parallelism of services, with returned results "stacked" into
receiving CAS; would need to change (some of) the IDs coming back.
> CAS would need to have the high-water-mark value as part of serializations.
> Backwards compatibility:
> * loading V2 CASs: generate new IDs upon loading.
> * serializing to V2: (for connecting to V2 services): drop the IDs.
> This is a proposed new V3 feature; comments appreciated.

This message was sent by Atlassian JIRA

View raw message