lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <...@teamware.com>
Subject Re: Why is lucene so slow indexing in nfs file system ?
Date Wed, 09 Jan 2008 22:04:40 GMT
Ariel wrote:

> The problem I have is that my application spends a lot of time to index all
> the documents, the delay to index 10 gb of pdf documents is about 2 days (to
> convert pdf to text I am using pdfbox) that is of course a lot of time,
> others applications based in lucene, for instance ibm omnifind only takes 5
> hours to index the same amount of pdfs documents. I would like to find out

If you are using log4j, make sure you have the pdfbox log4j categories set to 
info or higher, otherwise this really slows it down (factor of 10) or make sure 
you are using the non log4j version.  See 
http://sourceforge.net/forum/message.php?msg_id=3947448

Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message