lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Lucene Indexing out of memory
Date Wed, 03 Mar 2010 16:00:32 GMT
The worst case RAM usage for Lucene is a single doc with many unique
terms.  Lucene allocates ~60 bytes per unique term (plus space to hold
that term's characters = 2 bytes per char).  And, Lucene cannot flush
within one document -- it must flush after the doc has been fully

This past thread (also from Paul) delves into some of the details:

But it's not clear whether that is the issue affecting Ajay -- I think
more details about the docs, or, some code fragments, could help shed


On Tue, Mar 2, 2010 at 8:47 AM, Murdoch, Paul <> wrote:
> Ajay,
> Here is another thread I started on the same issue.
> n-indexing-large-files
> Paul
> -----Original Message-----
> From:
> [
> ] On Behalf Of ajay_gupta
> Sent: Tuesday, March 02, 2010 8:28 AM
> To:
> Subject: Lucene Indexing out of memory
> Hi,
> It might be general question though but I couldn't find the answer yet.
> I
> have around 90k documents sizing around 350 MB. Each document contains a
> record which has some text content. For each word in this text I want to
> store context for that word and index it so I am reading each document
> and
> for each word in that document I am appending fixed number of
> surrounding
> words. To do that first I search in existing indices if this word
> already
> exist and if it is then I get the content and append the new context and
> update the document. In case no context exist I create a document with
> fields "word" and "context" and add these two fields with values as word
> value and context value.
> I tried this in RAM but after certain no of docs it gave out of memory
> error
> so I thought to use FSDirectory method but surprisingly after 70k
> documents
> it also gave OOM error. I have enough disk space but still I am getting
> this
> error.I am not sure even for disk based indexing why its giving this
> error.
> I thought disk based indexing will be slow but atleast it will be
> scalable.
> Could someone suggest what could be the issue ?
> Thanks
> Ajay
> --
> View this message in context:
> html
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message