lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gili Nachum <GI...@il.ibm.com>
Subject Re: MMapDirectory performance - Are searchable field values contiguously stored in FS block?
Date Thu, 31 Jan 2013 12:07:53 GMT

Hi Mike,

So, when loading the results I want to return (say 10 documents), if not
all docs fit in RAM, I would incur up to 10 individual disk seek
operations. Which will kill my performance. Is that correct?

Considering what are my alternatives:
1. Create another separate lean index that would fit in RAM.
2. Keep stored fields to a minimum, store non frequent accessed store
fields outside of Lucene.

In this particular use case, it would have really helped if I could order
Lucene which stored fields should be eagerly read loading a document, and
which should be lazy loaded from else where in the disk. Thereby fitting
into memory those stored fields that are frequently needed.
I guess my use case is too specific?

Gili.


On Wed, Jan 23, 2013 at 8:59 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:

Are the additional rarely used 48 fields used for searching? Or, for
looking up stored fields?


If it's for searching then you should see good locality (efficient use of
the OS's IO cache) from the posting lists: each field's postings are stored
in a single chunk of the files, then the next field's postings, etc. Ie the
storage is "column stride" (if columns are fields and rows are documents).


But for stored fields, or term vectors, which are "row stride", you won't
see efficient use of the OS's IO cache.


Mike McCandless


http://blog.mikemccandless.com


On Wed, Jan 23, 2013 at 7:59 AM, Gili Nachum <GIL...@il.ibm.com> wrote:


Hi,


I have a search workload that focuses on two fields in my 1GB index. I get
very good performance when loaded the index via MmapDirectory. I attribute
this performance to the Operating System File System (FS OS) cache, that
keeps the most recently used FS blocks RAM resident.


I would like to add 50 more fields to the index, increasing it size to
~50GB, A key factor is that these additional fields will be queried very
rarely. Given this increase in index size, should I expect lower
Queries/Sec rate for the original search workload (that doesn't use the new
fields)?


I would assume that if the values of each searchable field are stored in a
different set of FS blocks, then the 50 additional fields would make no
difference for the OS FS cache, as it would continue to behave like before,
keeping in RAM those most used FS blocks. On the other hand, if values from
different fields share the same FS blocks, then the hot 2 fields values
will be to scattered acrossed the FS the OS cache useless. degradating
performance back to I/O bounded.


Which is the case with Lucene 3.6?


Thanks. Gili Nachum.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message