tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-612) Specify PDFBox options via ParseContext
Date Fri, 02 Sep 2011 20:13:09 GMT

    [ https://issues.apache.org/jira/browse/TIKA-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096266#comment-13096266

Jukka Zitting commented on TIKA-612:

+1 looks good to me.

A possible design improvement could be to make PDFParseOptions an interface like the following:

public interface PDFParseOptions {
    void apply(PDFTextStripper stripper);

The proposed bean class would implement that interface like this:

    public void apply(PDFTextStripper stripper) {

This would make it easy for client applications to apply also other PDF parsing settings not
currently known by Tika.

> Specify PDFBox options via ParseContext 
> ----------------------------------------
>                 Key: TIKA-612
>                 URL: https://issues.apache.org/jira/browse/TIKA-612
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>            Priority: Minor
>         Attachments: TIKA-612-testcase.patch, Tika-612.patch, testPDFTwoColumns.pdf
> See https://issues.apache.org/jira/browse/TIKA-611. The options used by PDFBox are currently
hardwritten in the PDFParser code, we will allow them to be specified via the ParseContext

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message