lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shelly_Singh <>
Subject Scaling Lucene to 1bln docs
Date Tue, 10 Aug 2010 06:54:44 GMT

I am developing an application which uses Lucene for indexing and searching 1 bln documents.
(the document size is very small though. Each document has a single field of 5-10 words; so
I believe that my data size is within the tested limits).

I am using the following configuration:
1.      1.5 gig RAM to the jvm
2.      100GB disk space.
3.      Index creation tuning factors:
a.      mergeFactor = 10
b.      maxFieldLength = 10
c.      maxMergeDocs = 5000000 (if I try with a larger value, I get an out-of-memory)

With these settings, I am able to create an index of 100 million docs (10 pow 8)  in 15 mins
consuming a disk space of 2.5gb. Which is quite satisfactory for me, but nevertheless, I want
to know what else can be done to tune it further. Please help.
Also, with these settings, can I expect the time and size to grow linearly for 1bln (10 pow
9) documents?

Thanks and Regards,

Shelly Singh
Center For KNowledge Driven Information Systems, Infosys
Phone: (M) 91 992 369 7200, (VoIP)2022978622

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message