tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-1441) ExternalParsers should allow dynamic keys to be specified for Regexs
Date Sat, 11 Oct 2014 15:22:33 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris A. Mattmann resolved TIKA-1441.
-------------------------------------
    Resolution: Fixed

- fixed in r1631060.

> ExternalParsers should allow dynamic keys to be specified for Regexs
> --------------------------------------------------------------------
>
>                 Key: TIKA-1441
>                 URL: https://issues.apache.org/jira/browse/TIKA-1441
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>         Environment: while working on TIKA-605 and memex
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.7
>
>         Attachments: TIKA-1441.Mattman.100914.patch.txt
>
>
> While working on TIKA-605, I was trying to use ExternalParsers and I came across an interesting
use case. What if there are so many met keys that specifying all of them by hand as individual
regexs would be repetitive, and tedious. What if the met key itself could also be specified
by a regex, e.g., we just take the first group to be the key, and then the next group would
be the actual value? I ran across this in parsing GDAL output and so a very simple improvement
to the ExternalParsers Map<Pattern, String> map would be to allow it to take e.g., null
or "" Strings and then take that to mean that the Pattern specifies *both* the key name *and*
the key value.
> I've got a patch I'll upload all tests pass and I need this to get TIKA-605 in and done.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message