tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1508) Add uniformity to parser parameter configuration
Date Mon, 15 Jun 2015 16:09:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14586245#comment-14586245

Tim Allison commented on TIKA-1508:

>From [~gagravarr],
> My personal view is that properties/configuration which apply to all documents of a type
should be set at Parser creation time, either from a Tika Config object or someone in code
doing "Parser p = new FooParser(); p.setblah();". Properties/config which vary from document
to document should be set on the ParseContext
> Not sure if we had consensus on that as a policy though?
+1 (from me)

> Add uniformity to parser parameter configuration
> ------------------------------------------------
>                 Key: TIKA-1508
>                 URL: https://issues.apache.org/jira/browse/TIKA-1508
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>             Fix For: 1.10
> We can currently configure parsers by the following means:
> 1) programmatically by direct calls to the parsers or their config objects
> 2) sending in a config object through the ParseContext
> 3) modifying .properties files for specific parsers (e.g. PDFParser)
> Rather than scattering the landscape with .properties files for each parser, it would
be great if we could specify parser parameters in the main config file, something along the
lines of this:
> {noformat}
>     <parser class="org.apache.tika.parser.audio.AudioParser">
>       <params>
>         <int name="someparam1">2</int>
>         <str name="someOtherParam2">something or other</str>
>       </params>
>       <mime>audio/basic</mime>
>       <mime>audio/x-aiff</mime>
>       <mime>audio/x-wav</mime>
>     </parser>
> {noformat}

This message was sent by Atlassian JIRA

View raw message