lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lamprecht <>
Subject Re: Ideal Index Fragmentation
Date Wed, 31 Aug 2005 00:01:08 GMT
It probably won't help performance to split the index and then search
it on the same machine unless you search the indexes in parallel (with
a multiprocessor or multi-core machine).  Even in this case, the disk
is often a bottleneck, essentially preventing the search from really
running in parallel.  Although if your index is in the filesystem
cache this may be fast.

Notice I said "probably", "often", "may be" in the above paragraph --
you just have to performance test it and measure it to see.  Start
with a simple, one-index setup and see if that works.  A 2GB index
isn't itself a problem (under linux at least).    Then your queries
and any pre/post-query processing become the bigger factors.  I
haven't run into any filesize limits under linux with indexes up to
12GB (and others here with even larger indexes).  The one exception is
using Lucene 1.9's MMapDirectory-- it's limited to (I believe) 4GB on
a 32-bit platform.

On 8/30/05, Friedland, Zachary (EDS - Strategy)
<> wrote:
> Does anyone have experience using lots of indexes simultaneously with
> the multisearcher?  I'm looking to index 15 distinct objects for
> searching, and was thinking of creating 15 distinct indexes for better
> manageability & performance (for certain searches when I know which
> index to search).
> Certain indexes will be very large (2-3 million documents), but most
> will be 50,000-500,000 documents.  Each document will contain a good
> number of fields (20-50).
> I know there are issues with file handles, but what happens to search
> performance (parallel vs. sequential) over lots of indexes.  Am I better
> off combining them into one huge index (2GB file limit may be an issue),
> or some fragmentation in-between....
> Any advice that you have had with a similar project would be greatly
> appreciated.
> Thanks,
> Zach
> --------------------------------------------------------
> If you are not an intended recipient of this e-mail, please notify the sender, delete
it and do not read, act upon, print, disclose, copy, retain or redistribute it. Click here
for important additional terms relating to this e-mail.
> --------------------------------------------------------
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message