uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Fwd: [jira] Closed: (UIMA-1067) Remove char heap/ref heap in StringHeap of the CAS
Date Mon, 09 Jun 2008 16:05:46 GMT
Thanks Eddie, that's great!

--Thilo

Eddie Epstein wrote:
> Thilo,
> 
> Just tested this change with the JNI interface to uimacpp and it works fine.
> 
> Eddie
> 
> 
> ---------- Forwarded message ----------
> From: Thilo Goetz (JIRA) <uima-dev@incubator.apache.org>
> Date: Fri, Jun 6, 2008 at 10:21 AM
> Subject: [jira] Closed: (UIMA-1067) Remove char heap/ref heap in StringHeap
> of the CAS
> To: uima-dev@incubator.apache.org
> 
> 
> 
>     [
> https://issues.apache.org/jira/browse/UIMA-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
> 
> Thilo Goetz closed UIMA-1067.
> -----------------------------
> 
>    Resolution: Fixed
> 
> Fixed, all unit tests pass.  Please test this change if you use (binary)
> serialization.  It should work the same as before, I haven't changed the
> serialization format in any way.
> 
>> Remove char heap/ref heap in StringHeap of the CAS
>> --------------------------------------------------
>>
>>                 Key: UIMA-1067
>>                 URL: https://issues.apache.org/jira/browse/UIMA-1067
>>             Project: UIMA
>>          Issue Type: Improvement
>>          Components: Core Java Framework
>>    Affects Versions: 2.2.2
>>            Reporter: Thilo Goetz
>>            Assignee: Thilo Goetz
>>             Fix For: 2.3
>>
>>
>> The StringHeap class provides two ways to store strings: either as Java
> strings, or by copying characters onto a character heap.  The second option
> is only used for deserialization from a binary CAS.  However, even if not
> used, this capability means a very significant memory overhead.  To
> demonstrate this, I ran the following experiment.  As analysis engine, I
> used our sandbox POS tagger.  It sets just one string feature on each token.
>  As text, I used a 2.4MB input file (2x moby.txt).  To run this in IBM Java
> 1.5.0_7 (which happens to be the JVM I'm interested in) you need to specify
> -Xmx135M.  I checked 5MB increments.  The I patched the StringHeap
> implementation to work without the additional book keeping overhead and ran
> the experiment again.  I was then able to run with -Xmx115M.  This
> represents a very significant gain, particularly given the fact that I ran
> so little analysis (only tokens and sentences are produced, and only a
> single string-valued feature set).  The new code also ran a tiny bit faster,
> but not much.  One might see more improvement for analysis that is not as
> compute intensive as the Tagger.
>> The challenge is to make sure that the serialization code still works
> after this change.
> 
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 

Mime
View raw message