lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <f...@efendi.ca>
Subject RE: Lucene FieldCache memory requirements
Date Tue, 03 Nov 2009 02:31:49 GMT
Hi Mark,

Yes, I understand it now; however, how will StringIndexCache size down in a
production system faceting by Country on a homepage? This is SOLR
specific...


Lucene specific: Lucene doesn't read from disk if it can retrieve field
value for a specific document ID from cache. How will it size down in purely
Lucene-based heavy-loaded production system? Especially if this cache is
used for query optimizations.



> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: November-02-09 8:53 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache memory requirements
> 
>  static final class StringIndexCache extends Cache {
>     StringIndexCache(FieldCache wrapper) {
>       super(wrapper);
>     }
> 
>     @Override
>     protected Object createValue(IndexReader reader, Entry entryKey)
>         throws IOException {
>       String field = StringHelper.intern(entryKey.field);
>       final int[] retArray = new int[reader.maxDoc()];
>       String[] mterms = new String[reader.maxDoc()+1];
>       TermDocs termDocs = reader.termDocs();
>       TermEnum termEnum = reader.terms (new Term (field));
>       int t = 0;  // current term number
> 
>       // an entry for documents that have no terms in this field
>       // should a document with no terms be at top or bottom?
>       // this puts them at the top - if it is changed,
> FieldDocSortedHitQueue
>       // needs to change as well.
>       mterms[t++] = null;
> 
>       try {
>         do {
>           Term term = termEnum.term();
>           if (term==null || term.field() != field) break;
> 
>           // store term text
>           // we expect that there is at most one term per document
>           if (t >= mterms.length) throw new RuntimeException ("there are
> more terms than " +
>                   "documents in field \"" + field + "\", but it's
> impossible to sort on " +
>                   "tokenized fields");
>           mterms[t] = term.text();
> 
>           termDocs.seek (termEnum);
>           while (termDocs.next()) {
>             retArray[termDocs.doc()] = t;
>           }
> 
>           t++;
>         } while (termEnum.next());
>       } finally {
>         termDocs.close();
>         termEnum.close();
>       }
> 
>       if (t == 0) {
>         // if there are no terms, make the term array
>         // have a single null entry
>         mterms = new String[1];
>       } else if (t < mterms.length) {
>         // if there are less terms than documents,
>         // trim off the dead array space
>         String[] terms = new String[t];
>         System.arraycopy (mterms, 0, terms, 0, t);
>         mterms = terms;
>       }
> 
>       StringIndex value = new StringIndex (retArray, mterms);
>       return value;
>     }
>   };
> 
> The formula for a String Index fieldcache is essentially the String
> array of unique terms (which does indeed "size down" at the bottom) and
> the int array indexing into the String array.
> 
> 
> Fuad Efendi wrote:
> > To be correct, I analyzed FieldCache awhile ago and I believed it never
> > "sizes down"...
> >
> > /**
> >  * Expert: The default cache implementation, storing all values in
memory.
> >  * A WeakHashMap is used for storage.
> >  *
> >  * <p>Created: May 19, 2004 4:40:36 PM
> >  *
> >  * @since   lucene 1.4
> >  */
> >
> >
> > Will it size down? Only if we are not faceting (as in SOLR v.1.3)...
> >
> > And I am still unsure, Document ID vs. Object Pointer.
> >
> >
> >
> >
> >
> >> I don't understand this:
> >>
> >>> so with a ton of docs and a few uniques, you get a temp boost in the
RAM
> >>> reqs until it sizes it down.
> >>>
> >> Sizes down??? Why is it called Cache indeed? And how SOLR uses it if it
is
> >> not cache?
> >>
> >>
> >
> >
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 




Mime
View raw message