uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Eckart de Castilho (JIRA)" <...@uima.apache.org>
Subject [jira] [Updated] (UIMA-1089) Space/Time tradeoffs in the CAS
Date Fri, 09 Sep 2016 17:49:22 GMT

     [ https://issues.apache.org/jira/browse/UIMA-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Richard Eckart de Castilho updated UIMA-1089:
    Labels: Stale  (was: )

This issue is marked as "stale" due to inactivity for 5 years or longer. If no further activity
is detected on this issue, it is scheduled be closed as 'unresolved' in 3 months time from
now (Dec 2016).

> Space/Time tradeoffs in the CAS
> -------------------------------
>                 Key: UIMA-1089
>                 URL: https://issues.apache.org/jira/browse/UIMA-1089
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.2.2, 2.3
>            Reporter: Marshall Schor
>            Priority: Minor
>              Labels: Stale
> Investigate / implement optimizations that trade user-controllable time (running the
optimizations) for space.  One such optimization could be: sharing strings.  To do the sharing
requires additional computation and (temporary) storage to detect the sharing opportunities,
but results in space savings.  For instance, a common annotation might assign short strings
like "noun" to a "part-of-speech" feature.  If you are processing a large document, there
may be a large number of these kinds of string valued features, picked from a small pool of
allowable values. The CAS's string storage might be able to be optimized to share the string
references in this case, at a cost of temporarily creating a hash table of the unique strings
and using it to identify sharing possibilities.  A new API call to do this optimization would
isolate the performance/space overhead of doing this optimization to just those users and
times where it makes sense to do this.
> An alternative would be to automatically figure this out for some selected kinds of optimizations,
but I'm not sure that could be done without impacting finely-tuned systems negatively.

This message was sent by Atlassian JIRA

View raw message