tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1508) Add uniformity to parser parameter configuration
Date Wed, 09 Mar 2016 03:40:40 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186434#comment-15186434
] 

ASF GitHub Bot commented on TIKA-1508:
--------------------------------------

GitHub user thammegowda opened a pull request:

    https://github.com/apache/tika/pull/91

    TIKA-1508 : Add uniformity to parser parameter configuration - contributed by Thamme Gowda

    1. Added `Configurable` interface.
     This can be used for all services like `Parser`, `Detector` which can take
      configurable parameters.
    
    2. Added `ConfigurableParser` interface which extends `Parser` interface.
       I didn't add new method to existing `Parser` because
        that will break the compatibility.
    
    3. `AbstractParser` extends `ConfigurableParser` and has
      default implementation for configure() contract.
      I think it is safe to do so and it doesn't break anything.
      In addition, all parsers which extend `AbstractParser` can easily
      access config from TikaConfig if they want to
    
    3. Added a TODO to `TikaConfig`,
     after this should allow multiple instances of same parser with
     different runtime configurations.
    
    4. `TikaConfig` is modified to detect if instance can be configured,
      if so, then checks if params are available in XML file, parses the
      params and invokes configure(ctx) method with these params
    
    5. Added `DummyConfigurableParser` that simply copies parameters to
     metadata for the sake of testing
    
    6. Added a sample XML config file for testing.
    Added `ConfigurableParserTest` that performs an end to end test of all
    the above.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thammegowda/tika TIKA-1508

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tika/pull/91.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #91
    
----
commit b2cf23178ede925b0ef23f88ebf1aff95c8c157c
Author: Thamme Gowda <tgowdan@gmail.com>
Date:   2016-03-09T02:23:19Z

    Add uniformity to parser parameter configuration.
    
    1. Added Configurable interface.
     This can be used for all services like Parser, Detector which can take
      configurable parameters.
    
    2. Added ConfigurableParser interface which extends Parser interface.
       I didn't add new method to existing Parser because
        that will break the compatibility.
    
    3. AbstractParser extends ConfigurableParser and has
      default implementation for configure() contract.
      I think it is safe to do so and it doesnt break anything.
      In addition all parsers which extend AbstractParser will can easily
      access config from TikaConfig if they want to
    
    3. Added a TODO to TikaConfig,
     after this should allow multiple instances of same parser with
     different runtime configurations.
    
    4. TikaConfig is modified to detect if instance can be configured,
      if so, then checks if params are available in XML file, parses the
      params and invokes configure(ctx) method with these params
    
    5. Added DummyConfigurableParser that simply copies parameters to
     metadata for the sake of testing
    
    6. Added a sample XML config file for testing.
    Added ConfigurableParserTest that performs an end to end test of all
    the above.

commit ae51417d8881dd90b921f02c2677a7d5bfd69a30
Author: Thamme Gowda <tgowdan@gmail.com>
Date:   2016-03-09T03:23:47Z

    remove unwanted TODO:

----


> Add uniformity to parser parameter configuration
> ------------------------------------------------
>
>                 Key: TIKA-1508
>                 URL: https://issues.apache.org/jira/browse/TIKA-1508
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>             Fix For: 1.13
>
>
> We can currently configure parsers by the following means:
> 1) programmatically by direct calls to the parsers or their config objects
> 2) sending in a config object through the ParseContext
> 3) modifying .properties files for specific parsers (e.g. PDFParser)
> Rather than scattering the landscape with .properties files for each parser, it would
be great if we could specify parser parameters in the main config file, something along the
lines of this:
> {noformat}
>     <parser class="org.apache.tika.parser.audio.AudioParser">
>       <params>
>         <int name="someparam1">2</int>
>         <str name="someOtherParam2">something or other</str>
>       </params>
>       <mime>audio/basic</mime>
>       <mime>audio/x-aiff</mime>
>       <mime>audio/x-wav</mime>
>     </parser>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message