tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-876) Signed pdf parsing
Date Fri, 27 Apr 2012 23:34:50 GMT

    [ https://issues.apache.org/jira/browse/TIKA-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264112#comment-13264112

Nick Burch commented on TIKA-876:

We still can't help you very much without a (small) sample file, any chance you could upload

If your PDFs really are wrapped in PKCS7, then we'll need something that unpacks the PCKS7
wrapper, and for signed files (initially - no way to supply the private key yet for encrypted
ones) triggers the recursing parser for the contents. I think BouncyCastle might help for
this, it's worth a look to start with

In r1331634 I've added some mime magic for pkcs7 files. I'm not sure if it's quite right or
not, but it seems OK for a few files I've tried. It'll need someone who knows the PCKS format
(or maybe just DER encoding?) to be sure though. Ideally, we should distinguish between signed,
encrypted and signed+encrypted, but I'm not sure how we do that...
> Signed pdf parsing
> ------------------
>                 Key: TIKA-876
>                 URL: https://issues.apache.org/jira/browse/TIKA-876
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Java 6.0, Ubuntu
>            Reporter: Fausto Cruzeiro de Moraes
>              Labels: features
>             Fix For: 1.0
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> Is there an estimated date for implementing default parsing for signed documents, like
signed pdf files (pk7s format), for example?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message