lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sarfaraz masood <>
Subject Indexing large amount of data
Date Mon, 12 Jul 2010 18:14:04 GMT
i have a large amount of data (120 GB) to be indexed in the index. Hence i want to improve
the performance of indexing this data. I went through the documentation given on the lucene
website which mentioned various ways by which the performance can be improved.

i am working on debian linux with amd64. hence the file size supported is very large. java
version is 1.6

i tried many points mentioned in that documentations but got unusual results.

1) Reuse field & document objects to reduce the GC overhead using the field.setValue()
method.. By doing this, instead of speeding up, the indexing speed reduced drastically. i
know this is unusual but thats what happened.

2) Tuning parameters by  setMergeFactor(), setMaxBufferedDocs(). 
now the default value for both is 10.. i increased the value to 1000.. by doing so the no
of .CSF file in the index folder increased many folds.. and i got : Too
Many Files Open. 
    IF i choose the default value 10 for both the parameters then this error is avoided
but then size of .fdt file in index becomes really high.

so where am i going wrong ?? how to overcome these to speed up my indexing process..

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message