lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Dumontier <dumont...@mshri.on.ca>
Subject significant performance issues
Date Tue, 07 Jan 2003 20:50:30 GMT
Hi all,

I just started trying to use Lucene to index approximately 13,000 XML 
documents representing biological data..each document is approximately 
20-30KB.

I modified some code from cocoon components to use SAX to parse my 
documents and create Lucene Documents. This process is very quick.

The following code is where i started off to write the index to disk.

writer = new IndexWriter(fsd, analyzer, true);

Iterator myit = docList.iterator();
    while(myit.hasNext()) {
        writer.addDocument((Document)myit.next());
        System.out.println(++counter);
     }
writer.close();

This is taking much more time than expected. I'm using the 
StandardAnalyzer, and my XML data is about 20-30Kb per file. The 
indexing is taking approximately 2-3 seconds per document and as the 
index grows it gets significantly slower. I'm running this on a 2.4GHz 
linux machine with 1GB ram.

I tried a few different stragegies, but i end up with too many files 
open exceptions.

I don't think it should progressively slow down in proportion to the 
size of the index. Is this assumption wrong?

Am i doing something wrong? is there a way to utilize the memory more 
and the filesystem less and just dump the index periodically?

any help would be appreciated..thanks

Marc Dumontier    
Intermediate Developer
Blueprint Initiative
Mount Sinai Hospital
http://www.bind.ca



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message