uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Closed] (UIMA-5168) uv3 vs backporting most things to uv2?
Date Tue, 15 Nov 2016 22:19:58 GMT

     [ https://issues.apache.org/jira/browse/UIMA-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Marshall Schor closed UIMA-5168.
    Resolution: Later

> uv3 vs backporting most things to uv2?
> --------------------------------------
>                 Key: UIMA-5168
>                 URL: https://issues.apache.org/jira/browse/UIMA-5168
>             Project: UIMA
>          Issue Type: Question
>          Components: Core Java Framework
>            Reporter: Marshall Schor
> The uv3 docs - overview has a summary of the "features" / benefits of uv3.  I was surprised
to realize, looking at these, that most of these could be back-ported into version 2.  
> Because of this, there is a choice in moving forwards, either to stick to the current
v2 data representation models (sticking), or switch to new v3 ones (for Java).  In the subsequent
discussion, "sticking" refers to a currently non-existent v2 where the v3 improvements (except
for changing how Feature Structures are stored) are backported.
> The two benefits lost in sticking are: 
> * garbage collection of unreferenced Feature Structures.
> * larger limits on the number of Feature Structures per CAS (approximately order of magnitude).
 This is due to the fact that in v2, all of the slots for all Feature Structures and int and
float arrays are kept in one int array, which has a limit of approximately 2 billion words.
> Benefits in sticking include:
> * (perhaps) better backwards compatibility
> * a smaller memory footprint if JCas is not being used (imagine UIMA running on a smartphone)
> * (maybe) better performance in some cases, including serialization
> Regarding performance differences:  v3 may be more performant in many cases because of
not needing to switch from low-level int handles to JCas object references.  But it may be
less performant in some operations involving serialization, because of the overhead to emulate/model
the way v2 does serialization.  New Native-to-v3 serializaton forms that are not backward
compatible could be added to v3 to overcome this.   
> The things that could be backported to v2 include:
> * redesigning the JCas cover classes for higher performance (eliminating the xxx_Type
classes, putting an extra field in the xxx cover class instead).
> ** note: a JCas class migration would be needed for this, similar to the one for v3.
> * redesigning much of the supporting infrastructure to improve performance by increasing
locality of reference.
> * supporting arbitrary Java Objects, and backporting the implementation of FSArrayList
and IntegerArrayList
> * integrating with Java 8 - including the new select framework
> * eliminating problems with ConcurrentModificationException while iterating over UIMA
> * reusing Type Systems
> Comparing v3 versus v2+backport, what do people think of the balance between pro/con?
 Should we focus on a v2+backport direction instead of v3?

This message was sent by Atlassian JIRA

View raw message