tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-876) Signed pdf parsing
Date Fri, 27 Apr 2012 23:34:50 GMT

    [ https://issues.apache.org/jira/browse/TIKA-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264112#comment-13264112
] 

Nick Burch commented on TIKA-876:
---------------------------------

We still can't help you very much without a (small) sample file, any chance you could upload
one?

If your PDFs really are wrapped in PKCS7, then we'll need something that unpacks the PCKS7
wrapper, and for signed files (initially - no way to supply the private key yet for encrypted
ones) triggers the recursing parser for the contents. I think BouncyCastle might help for
this, it's worth a look to start with

In r1331634 I've added some mime magic for pkcs7 files. I'm not sure if it's quite right or
not, but it seems OK for a few files I've tried. It'll need someone who knows the PCKS format
(or maybe just DER encoding?) to be sure though. Ideally, we should distinguish between signed,
encrypted and signed+encrypted, but I'm not sure how we do that...
                
> Signed pdf parsing
> ------------------
>
>                 Key: TIKA-876
>                 URL: https://issues.apache.org/jira/browse/TIKA-876
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Java 6.0, Ubuntu
>            Reporter: Fausto Cruzeiro de Moraes
>              Labels: features
>             Fix For: 1.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Is there an estimated date for implementing default parsing for signed documents, like
signed pdf files (pk7s format), for example?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message