lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Breck Baldwin <>
Subject Re: experiences with lingpipe
Date Fri, 03 Nov 2006 14:59:23 GMT

Martin Braun wrote:
> Hi Breck,
> i have tried your tutorial and built (hopefully) a successful
> SpellCheck.model File with
> 49M.
> My Lucene Index directory is 2,4G. When I try to read the Model with the
> readmodel function,
> i get an "Exception in thread "main" java.lang.OutOfMemoryError: Java
> heap space", though I started java with -Xms1024m -Xmx1024m.
> How many RAM will I need for the Model (I only have 2 GB of physical
> RAM, and lucene's also using some memory).

You need to increase the memory for java. I think 32-bit jave is limited 
to a 1.3 gig heap but could be wrong. No heuristics at the tip of my 

To make the spell checker smaller you can prune the tokens using the
pruneLM method in the TrainSpellChecker. Pruning the 1 counts should 
make a big difference and not hurt spelling too much (depends on how 
things are paramterized). Probably up to 5 counts won't matter.

Also look at my tuning tutorial that is in very rough shape but will
get you going on tuning at:

cvs co 

I will try to get another pass at it over the weekend.

b reck

> Is there a "rule of thumb" to calculate the needed amount of memory of
> the model?
> thanks in advance,
> martin
>>>>Tuning params dominate the performance space. A small beam (16 active
>>>>hypotheses) will be quite snappy (I have 200 queries/sec with a 32 beam.
>>>>over a 80 gig text collection that with some pruning was 5 gig in memory
>>>>running an 8 gram model)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message