tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Jackson (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-847) Add regular expression support to the MagicDetector
Date Tue, 07 Feb 2012 22:34:59 GMT

    [ https://issues.apache.org/jira/browse/TIKA-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202883#comment-13202883

Andrew Jackson commented on TIKA-847:

As far as I can tell, Lucene's FSM engine is so very new it is only in the v4 snapshot, so
I would prefer it if we stuck to the built-in RegEx functionality for now and revisited this
optimisation under a separate ticket, at least until Lucene 4 is released.

Is there anything [~pete.s.may] or I can do to help this issue along?
> Add regular expression support to the MagicDetector
> ---------------------------------------------------
>                 Key: TIKA-847
>                 URL: https://issues.apache.org/jira/browse/TIKA-847
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Andrew Jackson
>              Labels: detection, format
>         Attachments: regex_support.patch
> Following on from TIKA-86, we would like to add support for regular expressions to the
MagicDetector. This would allow more signatures to be re-used from more sources (e.g. the
file(1) command). As part of the SCAPE Project, we have added this functionality to our own
Tika branch (e.g. https://github.com/openplanets/tika/commit/b8de9e77c8b432788e3f73a4dbccca8ea044b503)
and are working to tidy this up to make a clean patch we can submit here.
> BTW, are there any patch submission guidelines or coding standards we should check our
work against first?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message