tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1663) Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata
Date Wed, 01 Jul 2015 11:43:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609976#comment-14609976
] 

Tim Allison commented on TIKA-1663:
-----------------------------------

For those curious, I found no speed hit in adding md5 hashing to a batch run against the ~1million
documents in govdocs1.  Admittedly, I didn't do thorough benchmarking, but the one digesting
run with trunk I ran was a little bit faster than the one non-digesting run I did, where "little
bit faster" = "difference was small enough to be in the noise."

> Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata
> -------------------------------------------------------------------
>
>                 Key: TIKA-1663
>                 URL: https://issues.apache.org/jira/browse/TIKA-1663
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: digesting_parser_v1.patch
>
>
> It might be useful to integrate commons' DigestUtils and allow users to easily add the
MD5 or other supported hashes to the Metadata object.
> Anyone else find this of use?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message