tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Gribov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1489) PDF Text extraction without permission
Date Mon, 01 Dec 2014 14:39:13 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229850#comment-14229850

Konstantin Gribov commented on TIKA-1489:

I think, some field in meta should be sufficient to create well-behaved software which will
respect this access permissions. And to avoid breaking existing software.

So, my +1 for [~gagravarr] variant.

> PDF Text extraction without permission
> --------------------------------------
>                 Key: TIKA-1489
>                 URL: https://issues.apache.org/jira/browse/TIKA-1489
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.7
>            Reporter: Tilman Hausherr
> In TIKA-1442 text extraction from files like 717226.pdf that don't have text extraction
permission works. The permissions in PDF files are only enforced by the application (i.e.
PDFBox), i.e. the text information isn't stored separately in encrypted form. 
> PDFBox ExtractText command line does throw an exception.
> So I wonder why TIKA is able to extract text. Either TIKA or the PDFBox call used bypasses
the permission checking.

This message was sent by Atlassian JIRA

View raw message