uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Lally <ala...@alum.rpi.edu>
Subject Re: small memory footprint tradeoff configuration
Date Thu, 12 Mar 2009 16:14:13 GMT
On Wed, Mar 11, 2009 at 8:53 AM, Marshall Schor <msa@schor.com> wrote:
> I agree in general about not making things more complicated at least to
> the user.  I can imagine education working for
>  1) things like string interning
>  2) things like deleting features from type systems where they're not
> being used, and where the annotator producing them will respect this.
> What this approach seems to miss are the following kinds of things:
> 1) cases where some set of annotators produce feature structures, which,
> after some point, are no longer needed, and are "deleted" but
> never-the-less continue to consume space.
> 2) cases where some set of annotators produce feature structures having
> lots of fields, where, after some point, the fields are no longer needed.
> If these are not significant use-cases in practice, then I'm happy to
> think-about / work-on other things :-).

I'd like to propose discussing the different ideas here one at a time.
 We had enough trouble coming to any agreement on GC the last time
that we discussed it, without also throwing string interning and
feature deleting into the mix.

So focusing on GC first (unless you think one of the others is more important):

My inclination is to assure that GC deletes only garbage, and that
there's no possibility that anything GC'ed could have been referenced
by anybody.  The other proposals that don't have this guarantee are
scary to me.

A way to accomplish this guarantee would be that when the process
method of an AnalysisEngine (could be either primitive or aggregate)
completes, we can mark as garbage any FS's that were created since the
beginning of that process method, but which are not referenced
directly or indirectly from anything in the indexes.  Does this
concept seem reasonable?

The next question is under what conditions would a GC execute.
Requiring an explicit call seems counter to what other garbage
collecting runtime environments do, and like Thilo I'm confused about
who would call this and when.  I think it would be better to define
the parameters that control GC in the PerformanceTuningSettings that
we already have, and make them dependent on how much CAS heap space is
used relative to a GC threshold that the user has set in the


View raw message