lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Use multiple lucene indices
Date Tue, 06 Dec 2011 06:11:19 GMT

>> would the memory usage go through the roof?

Yup ....

My past experience got me pickels  in there...

with regards

On Mon, Dec 5, 2011 at 11:28 PM, Rui Wang <> wrote:

> Hi All,
> We are planning to use lucene in our project, but not entirely sure about
> some of the design decisions were made. Below are the details, any
> comments/suggestions are more than welcome.
> The requirements of the project are below:
> 1. We have  tens of thousands of files, their size ranging from 500M to a
> few terabytes, and majority of the contents in these files will not be
> accessed frequently.
> 2. We are planning to keep less accessed contents outside of our database,
> store them on the file system.
> 3. We also have code to get the binary position of these contents in the
> files. Using these binary positions, we can quickly retrieve the contents
> and convert them into our domain objects.
> We think Lucene provides a scalable solution for storing and indexing
> these binary positions, so the idea is that each piece of the content in
> the files will a document, each document will have at least an ID field to
> identify to content and a binary position field contains the starting and
> stop position of the content. Having done some performance testing, it
> seems to us that Lucene is well capable of doing this.
> At the moment, we are planning to create one Lucene index per file, so if
> we have new files to be added to the system, we can simply generate a new
> index. The problem is do with searching, this approach means that we need
> to create an new IndexSearcher every time a file is accessed through our
> web service. We knew that it is rather expensive to open a new
> IndexSearcher, and are thinking of using some kind of pooling mechanism.
> Our questions are:
> 1. Is this one index per file approach a viable solution? What do you
> think about pooling IndexSearcher?
> 2. If we have many IndexSearchers opened at the same time, would the
> memory usage go through the roof? I couldn't find any document on how
> Lucene use allocate memory.
> Thank you very much for your help.
> Many thanks,
> Rui Wang
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message