lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <f...@efendi.ca>
Subject RE: Lucene FieldCache memory requirements
Date Tue, 03 Nov 2009 15:34:14 GMT
Sorry Mike, Mark, I am confused again...

Yes, I need some more memory for processing ("while FieldCache is being
loaded"), obviously, but it was not main subject...

With StringIndexCache, I have 10 arrays (cardinality of this field is 10)
storing  (int) Lucene Document ID.

> Except: as Mark said, you'll also need transient memory = pointer (4
> or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded.

Ok, I see it:
      final int[] retArray = new int[reader.maxDoc()];
      String[] mterms = new String[reader.maxDoc()+1];

I can't track right now (limited in time), I think mterms is local variable
and will size down to 0...



So that correct formula is... weird one... if you don't want unexpected OOM
or overloaded GC (WeakHashMaps...):

      [some heap] + [Non-Tokenized_Field_Count] x [maxdoc] x [4 bytes + 8
bytes]

(for 64-bit)


-Fuad


> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: November-03-09 5:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache memory requirements
> 
> On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi <fuad@efendi.ca> wrote:
> > I believe this is correct estimate:
> >
> >> C. [maxdoc] x [4 bytes ~ (int) Lucene Document ID]
> >>
> >>   same as
> >> [String1_Document_Count + ... + String10_Document_Count + ...]
> >> x [4 bytes per DocumentID]
> 
> That's right.
> 
> Except: as Mark said, you'll also need transient memory = pointer (4
> or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded.  After
> it's done being loaded, this sizes down to the number of unique terms.
> 
> But, if Lucene did the basic int packing, which really we should do,
> since you only have 10 unique values, with a naive 4 bits per doc
> encoding, you'd only need 1/8th the memory usage.  We could do a bit
> better by encoding more than one document at a time...
> 
> Mike



Mime
View raw message