lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject RE: Memory consumption on lucene 2.4
Date Fri, 21 Nov 2014 21:03:47 GMT
Philippe Kernévez [pkernevez@octo.com] wrote:
> We use Lucene 2.4 (provided by Alfresco).

Lucene 2.4 is 6 years old. The obvious advice is to upgrade, but I guess you have your reasons
not to.

> We looked at a memory dump with Eclipse Memory Analyser, and we were quite
> surprised to see that most of that memory is kept by enormous String[] that
> are yet mostly empty.

I am guessing you have a lot of documents in your index and that you are sorting on at least
one String field?

http://www.lhelper.org/dev/lucene-2.4.0/docs/api/org/apache/lucene/search/Sort.html
states that sorting on String in Lucene means that all Strings for that field are kept in
memory. There has to be one entry in the String array(s) for each document, even if the document
does not have a value for that field.

If my guess is correct, the solution is to reduce the number of String sort fields, ideally
to 0. Maybe you can use an integer field instead by doing some mapping?

> In our case we need to have some very short word indexed, so we desactivate
> 'stop words'. If we want to have the list of Term order by their index size
> what is good tool to do that (Luce?) and how ca we do such request ?

Luke has term statistics build-in. I don't remember the details, but I recall that it was
straight forward.

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message