lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <>
Subject Re: Full disk space during indexing process with 120 gb of free disk space
Date Mon, 04 Dec 2006 14:44:43 GMT
PDFBox version 0.6 is quite old and there have been many improvements, 
you should look at moving to the newest version 0.7.3, although from the 
description of your problem it probably would not resolve it.

If there are a large number of temp files with "pdfbox" in the name then 
you are most likely not calling close() on the PDDocument object.  How 
are you adding the documents to the index.  There is a simple helper 
class called org.pdfbox.searchengine.lucene.LucenePDFDocment that you 
may find useful.


Ariel Isaac Romero Cartaya wrote:
> Hi every body:
>   I am getting a problem during the indexing process, I am indexing big
> amounts of texts most of them in pdf format I am using pdf box 0.6 
> version.
> The space in hard disk before that the indexing process begin is 
> around 120
> Gb but incredibly even when my lucene index doesn't have yet 300 mb my 
> hard
> disk has not already free space, more incredible is that when I turn 
> off the
> process of indexing then the free disk space arise rapidly to 120 Gb. How
> could happen this if I doesn't copy the documents to the disk ??? , I 
> have a
> linux machine for the indexing process, I have been thinking that 
> could be
> the temporaly files of something , may be pdf box ???
> Could you help me please ???
> Greetings

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message