uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Created] (UIMA-5168) uv3 vs backporting most things to uv2?
Date Thu, 03 Nov 2016 14:05:00 GMT
Marshall Schor created UIMA-5168:
------------------------------------

             Summary: uv3 vs backporting most things to uv2?
                 Key: UIMA-5168
                 URL: https://issues.apache.org/jira/browse/UIMA-5168
             Project: UIMA
          Issue Type: Question
          Components: Core Java Framework
            Reporter: Marshall Schor


The uv3 docs - overview has a summary of the "features" / benefits of uv3.  I was surprised
to realize, looking at these, that most of these could be back-ported into version 2.  

Because of this, there is a choice in moving forwards, either to stick to the current v2 data
representation models (sticking), or switch to new v3 ones (for Java).  In the subsequent
discussion, "sticking" refers to a currently non-existent v2 where the v3 improvements (except
for changing how Feature Structures are stored) are backported.

The two benefits lost in sticking are: 
* garbage collection of unreferenced Feature Structures.
* larger limits on the number of Feature Structures per CAS (approximately order of magnitude).
 This is due to the fact that in v2, all of the slots for all Feature Structures and int and
float arrays are kept in one int array, which has a limit of approximately 2 billion words.

Benefits in sticking include:
* (perhaps) better backwards compatibility
* a smaller memory footprint if JCas is not being used (imagine UIMA running on a smartphone)
* (maybe) better performance in some cases, including serialization

Regarding performance differences:  v3 may be more performant in many cases because of not
needing to switch from low-level int handles to JCas object references.  But it may be less
performant in some operations involving serialization, because of the overhead to emulate/model
the way v2 does serialization.  New Native-to-v3 serializaton forms that are not backward
compatible could be added to v3 to overcome this.   

The things that could be backported to v2 include:
* redesigning the JCas cover classes for higher performance (eliminating the xxx_Type classes,
putting an extra field in the xxx cover class instead).
** note: a JCas class migration would be needed for this, similar to the one for v3.
* redesigning much of the supporting infrastructure to improve performance by increasing locality
of reference.
* supporting arbitrary Java Objects, and backporting the implementation of FSArrayList and
IntegerArrayList
* integrating with Java 8 - including the new select framework
* eliminating problems with ConcurrentModificationException while iterating over UIMA indexes
* reusing Type Systems

Comparing v3 versus v2+backport, what do people think of the balance between pro/con?  Should
we focus on a v2+backport direction instead of v3?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message