tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Jackson (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-847) Add regular expression support to the MagicDetector
Date Tue, 17 Jan 2012 11:10:39 GMT

    [ https://issues.apache.org/jira/browse/TIKA-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187587#comment-13187587

Andrew Jackson commented on TIKA-847:

Under TIKA-86, @kkrugler suggested we use the Lucene FSM RegEx engine (see https://issues.apache.org/jira/browse/LUCENE-1606
and http://search-lucene.com/jd/lucene/org/apache/lucene/search/RegexpQuery.html). However,
I believe this code is not yet in a stable Lucene version (4.0-SNAPSHOT only) and so would
rather we defer that optimisation to a later date (under a separate ticket).
> Add regular expression support to the MagicDetector
> ---------------------------------------------------
>                 Key: TIKA-847
>                 URL: https://issues.apache.org/jira/browse/TIKA-847
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Andrew Jackson
>              Labels: detection, format
> Following on from TIKA-86, we would like to add support for regular expressions to the
MagicDetector. This would allow more signatures to be re-used from more sources (e.g. the
file(1) command). As part of the SCAPE Project, we have added this functionality to our own
Tika branch (e.g. https://github.com/openplanets/tika/commit/b8de9e77c8b432788e3f73a4dbccca8ea044b503)
and are working to tidy this up to make a clean patch we can submit here.
> BTW, are there any patch submission guidelines or coding standards we should check our
work against first?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message