uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thilo Goetz (JIRA)" <uima-...@incubator.apache.org>
Subject [jira] Closed: (UIMA-483) JCas method like getSofaDataString that doesn't copy the chars from the StringHeap
Date Wed, 13 Aug 2008 08:40:44 GMT

     [ https://issues.apache.org/jira/browse/UIMA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Thilo Goetz closed UIMA-483.

       Resolution: Fixed
    Fix Version/s: 2.3

This issue has been fixed as a side effect of the 2.2.2 memory hotfix.

> JCas method like getSofaDataString that doesn't copy the chars from the StringHeap
> ----------------------------------------------------------------------------------
>                 Key: UIMA-483
>                 URL: https://issues.apache.org/jira/browse/UIMA-483
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.1, 2.2
>            Reporter: Greg Holmberg
>             Fix For: 2.3
> I process large documents--the String I pass to JCas.setSofaDataString may be as large
100 MBs (50,000,000 chars).  This is causing the JVM to run out of memory when we have many
concurrent AnalysisEngines running.
> I traced JCas.getSofaDataString(), and it eventually calls StringHeap.getStringForCode(),
which does a "new String" from it's private char[] (which does a copy).
> This would happen for each annotator.  We have five, so now the 100 MBs has become 600
MBs.  Multiply by 10 concurrent AnalysisEngines, and that's 6,000 MBs.
> Perhaps there could be a variation on getSofaDataString that returns one of the other
classes (besides String) that implements CharSequence.  A CharBuffer perhaps, or even a new
class the implements the CharSequence interface but is read-only (just four methods).  Or
even just return a char[] or char[] and begin/end offset into the StringHeap.
> If nothing else, perhaps the document text should be treated specially from all the little
strings in the StringHeap, and be stored separately, so calls to getSofaDataString() simply
return a reference to an existing String object, without copying.
> I'm open to possibilities, I just need the copying to end.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message