uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: small memory footprint tradeoff configuration
Date Fri, 20 Feb 2009 16:21:35 GMT
Thilo Goetz wrote:
> Marshall Schor wrote:
>   
>> One of the ideas for GC was to change the basic heap design to use java
>> objects for feature structures.  I'm thinking of some kind of explicit
>> GC, called by the user, at a point where they know a bunch of objects is
>> no longer needed (because they've just deleted things out of the index,
>> for instance).  The use case is one where some set of annotators might
>> generate many "alternatives", and then a subsequent annotator "picks"
>> one, and removes the others from the index.
>>
>> I'm thinking that the implementation might be based on the deep CAS copy
>> code we already have, modified in an attempt to avoid needing extra space. 
>>
>> I think this would avoid many of the other issues mentioned in the
>> previous thread http://markmail.org/thread/aolbz4nrvmgjhuyb.  If there
>> are issues/concerns with this kind of approach, please post/discuss.
>>     
>
> It would change the internal IDs of FSs, which was always a
> big no-no for some people.
>   
True.  For things like delta-cas, or parallel processing flows in
UIMA-AS, which use a high-water mark of some kind, I'm thinking (hoping)
we could make this work.

For other cases, I don't know how much an issue this is.  In any case,
by having it be not-automatic, but rather user-invoked via some explicit
call (e.g., myCas.reclaim-space()), I'm hoping (again) that only users
who were not needing the internal IDs the same would call this. 

This was helpful - please post refs to other areas to look into before
proceeding.

-Marshall
>   
>> -Marshall
>>
>> Thilo Goetz wrote:
>>     
>>> Marshall Schor wrote:
>>>   
>>>       
>>>> Some users are beginning to ask for the ability to shift the internal
>>>> tradeoffs UIMA takes toward having a smaller memory footprint, at some
>>>> cost in performance.
>>>>
>>>> Several areas in particular have come up: 
>>>>   1) "interning" string objects, so that only one copy exists
>>>>   2) having some way to "compact" or garbage-collect the CAS
>>>>     
>>>>         
>>> My suggestions for garbage collection in the CAS met with strong
>>> resistance on this list in the past.  I'll be interested to see
>>> what you'll come up with to overcome that resistance.
>>>
>>>   
>>>       
>>>> Are there other things that should be considered for trade-off here?
>>>>
>>>> -Marshall
>>>>     
>>>>         
>>>   
>>>       
>
>
>
>   

Mime
View raw message