tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Created: (TIKA-100) Structured PDF parsing
Date Mon, 12 Nov 2007 22:05:50 GMT
Structured PDF parsing

                 Key: TIKA-100
                 URL: https://issues.apache.org/jira/browse/TIKA-100
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Jukka Zitting
            Assignee: Jukka Zitting
            Priority: Minor

The PDF parser currently extracts and outputs document content as a single string. PDFBox
could be used to support structuring at least down to page and paragraph (not sure how accurate)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message