uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: small memory footprint tradeoff configuration
Date Mon, 23 Feb 2009 17:10:16 GMT
Marshall Schor wrote:
> Thilo Goetz wrote:
>> Marshall Schor wrote:
>>   
>>> One of the ideas for GC was to change the basic heap design to use java
>>> objects for feature structures.  I'm thinking of some kind of explicit
>>> GC, called by the user, at a point where they know a bunch of objects is
>>> no longer needed (because they've just deleted things out of the index,
>>> for instance).  The use case is one where some set of annotators might
>>> generate many "alternatives", and then a subsequent annotator "picks"
>>> one, and removes the others from the index.
>>>
>>> I'm thinking that the implementation might be based on the deep CAS copy
>>> code we already have, modified in an attempt to avoid needing extra space. 
>>>
>>> I think this would avoid many of the other issues mentioned in the
>>> previous thread http://markmail.org/thread/aolbz4nrvmgjhuyb.  If there
>>> are issues/concerns with this kind of approach, please post/discuss.
>>>     
>> It would change the internal IDs of FSs, which was always a
>> big no-no for some people.
>>   
> True.  For things like delta-cas, or parallel processing flows in
> UIMA-AS, which use a high-water mark of some kind, I'm thinking (hoping)
> we could make this work.
> 
> For other cases, I don't know how much an issue this is.  In any case,
> by having it be not-automatic, but rather user-invoked via some explicit
> call (e.g., myCas.reclaim-space()), I'm hoping (again) that only users
> who were not needing the internal IDs the same would call this. 

So the users are supposed to figure out if they need internal
IDs?  I don't think that's a good idea.  Either we make guarantees
about things like references into the CAS surviving calls to
process(), or we don't.

> 
> This was helpful - please post refs to other areas to look into before
> proceeding.
> 
> -Marshall
>>   
>>> -Marshall
>>>
>>> Thilo Goetz wrote:
>>>     
>>>> Marshall Schor wrote:
>>>>   
>>>>       
>>>>> Some users are beginning to ask for the ability to shift the internal
>>>>> tradeoffs UIMA takes toward having a smaller memory footprint, at some
>>>>> cost in performance.
>>>>>
>>>>> Several areas in particular have come up: 
>>>>>   1) "interning" string objects, so that only one copy exists
>>>>>   2) having some way to "compact" or garbage-collect the CAS
>>>>>     
>>>>>         
>>>> My suggestions for garbage collection in the CAS met with strong
>>>> resistance on this list in the past.  I'll be interested to see
>>>> what you'll come up with to overcome that resistance.
>>>>
>>>>   
>>>>       
>>>>> Are there other things that should be considered for trade-off here?
>>>>>
>>>>> -Marshall
>>>>>     
>>>>>         
>>>>   
>>>>       
>>
>>
>>   

Mime
View raw message