uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: small memory footprint tradeoff configuration
Date Thu, 12 Mar 2009 16:37:48 GMT
Adam Lally wrote:
> On Wed, Mar 11, 2009 at 8:53 AM, Marshall Schor <msa@schor.com> wrote:
>> I agree in general about not making things more complicated at least to
>> the user.  I can imagine education working for
>>  1) things like string interning
>>  2) things like deleting features from type systems where they're not
>> being used, and where the annotator producing them will respect this.
>>
>> What this approach seems to miss are the following kinds of things:
>>
>> 1) cases where some set of annotators produce feature structures, which,
>> after some point, are no longer needed, and are "deleted" but
>> never-the-less continue to consume space.
>>
>> 2) cases where some set of annotators produce feature structures having
>> lots of fields, where, after some point, the fields are no longer needed.
>>
>> If these are not significant use-cases in practice, then I'm happy to
>> think-about / work-on other things :-).
>>
> 
> 
> I'd like to propose discussing the different ideas here one at a time.
>  We had enough trouble coming to any agreement on GC the last time
> that we discussed it, without also throwing string interning and
> feature deleting into the mix.
> 
> So focusing on GC first (unless you think one of the others is more important):
> 
> My inclination is to assure that GC deletes only garbage, and that
> there's no possibility that anything GC'ed could have been referenced
> by anybody.  The other proposals that don't have this guarantee are
> scary to me.
> 
> A way to accomplish this guarantee would be that when the process
> method of an AnalysisEngine (could be either primitive or aggregate)
> completes, we can mark as garbage any FS's that were created since the
> beginning of that process method, but which are not referenced
> directly or indirectly from anything in the indexes.  Does this
> concept seem reasonable?

+1. I like the idea because it is sort of local on the one
hand, but still allows one to delete FSs from indexes
later in the processing and have them garbage collected
(on exiting the containing aggregate).

> 
> The next question is under what conditions would a GC execute.
> Requiring an explicit call seems counter to what other garbage
> collecting runtime environments do, and like Thilo I'm confused about
> who would call this and when.  I think it would be better to define
> the parameters that control GC in the PerformanceTuningSettings that
> we already have, and make them dependent on how much CAS heap space is
> used relative to a GC threshold that the user has set in the
> PerformanceTuningSettings.

+1, and the default could be "no GC", so it would be
perfectly backwards compatible.  I'm thinking of the
kinds of scenarios that I often work with, where
basically all the annotations are later written to
an index, and any attempt at GC would be futile and
just consume time to no benefit.

> 
>  -Adam


Mime
View raw message