lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Anderson <>
Subject Re: [ANN] PDFBox 0.6.0
Date Thu, 06 Mar 2003 08:52:56 GMT
In attempting to use the PDFBox-0.6.0, I rec'd the following error when 
attempting to scan a reasonably sized PDF repository.

Any thoughts?

 caught a class
 with message: Unexpected end of ZLIB input stream

Eric Anderson
LanRx Network Solutions

Quoting Ben Litchfield <>:

> I would like to announce the next release of PDFBox.  PDFBox allows for
> PDF documents to be indexed using lucene through a simple interface.
> Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument,
> which will extract all text and PDF document summary properties as lucene
> fields.
> You can obtain the latest release from
> Please send all bug reports to me and attach the PDF document when
> possible.
> RELEASE 0.6.0
> -Massive improvements to memory footprint.
> -Must call close() on the COSDocument(LucenePDFDocument does this for you)
> -Really fixed the bug where small documents were not being indexed.
> -Fixed bug where no whitespace existed between obj and start of object.
>     Exception in thread "main" expected='obj'
>     actual='obj<</Pro
> -Fixed issue with spacing where textLineMatrix was not being copied
>  properly
> -Fixed 'bug' where parsing would fail with some pdfs with double endobj
>  definitions
> -Added PDF document summary fields to the lucene document
> Thank you,
> Ben Litchfield
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

LanRx Network Solutions, Inc.
Providing Enterprise Level Solutions...On A Small Business Budget

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message