lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene FieldCache memory requirements
Date Mon, 02 Nov 2009 22:59:58 GMT
OK I think someone who knows how Solr uses the fieldCache for this
type of field will have to pipe up.

For Lucene directly, simple strings would consume an pointer (4 or 8
bytes depending on whether your JRE is 64bit) per doc, and the string
index would consume an int (4 bytes) per doc.  (Each also consume
negligible (for your case) memory to hold the actual string values).

Note that for your use case, this is exceptionally wasteful.  If
Lucene had simple bit-packed ints (I've opened LUCENE-1990 for this)
then it'd take much fewer bits to reference the values, since you have
only 10 unique string values.

Mike

On Mon, Nov 2, 2009 at 3:57 PM, Fuad Efendi <fuad@efendi.ca> wrote:
> I am not using Lucene API directly; I am using SOLR which uses Lucene
> FieldCache for faceting on non-tokenized fields...
> I think this cache will be lazily loaded, until user executes sorted (by
> this field) SOLR query for all documents *:* - in this case it will be fully
> populated...
>
>
>> Subject: Re: Lucene FieldCache memory requirements
>>
>> Which FieldCache API are you using?  getStrings?  or getStringIndex
>> (which is used, under the hood, if you sort by this field).
>>
>> Mike
>>
>> On Mon, Nov 2, 2009 at 2:27 PM, Fuad Efendi <fuad@efendi.ca> wrote:
>> > Any thoughts regarding the subject? I hope FieldCache doesn't use more
> than
>> > 6 bytes per document-field instance... I am too lazy to research Lucene
>> > source code, I hope someone can provide exact answer... Thanks
>> >
>> >
>> >> Subject: Lucene FieldCache memory requirements
>> >>
>> >> Hi,
>> >>
>> >>
>> >> Can anyone confirm Lucene FieldCache memory requirements? I have 100
>> >> millions docs with non-tokenized field "country" (10 different
> countries);
>> > I
>> >> expect it requires array of ("int", "long"), size of array 100,000,000,
>> >> without any impact of "country" field length;
>> >>
>> >> it requires 600,000,000 bytes: "int" is pointer to document (Lucene
>> > document
>> >> ID),  and "long" is pointer to String value...
>> >>
>> >> Am I right, is it 600Mb just for this "country" (indexed,
> non-tokenized,
>> >> non-boolean) field and 100 millions docs? I need to calculate exact
>> > minimum RAM
>> >> requirements...
>> >>
>> >> I believe it shouldn't depend on cardinality (distribution) of field...
>> >>
>> >> Thanks,
>> >> Fuad
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>
>
>

Mime
View raw message