lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <>
Subject [ANN] PDFBox 0.6.0
Date Wed, 05 Mar 2003 23:51:24 GMT
I would like to announce the next release of PDFBox.  PDFBox allows for
PDF documents to be indexed using lucene through a simple interface.
Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
which will extract all text and PDF document summary properties as lucene

You can obtain the latest release from

Please send all bug reports to me and attach the PDF document when

-Massive improvements to memory footprint.
-Must call close() on the COSDocument(LucenePDFDocument does this for you)
-Really fixed the bug where small documents were not being indexed.
-Fixed bug where no whitespace existed between obj and start of object.
    Exception in thread "main" expected='obj'
-Fixed issue with spacing where textLineMatrix was not being copied
-Fixed 'bug' where parsing would fail with some pdfs with double endobj
-Added PDF document summary fields to the lucene document

Thank you,
Ben Litchfield

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message