jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-2463) Provide support for providing custom Tika config
Date Mon, 02 Feb 2015 09:38:34 GMT

    [ https://issues.apache.org/jira/browse/OAK-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298593#comment-14298593
] 

Chetan Mehrotra edited comment on OAK-2463 at 2/2/15 9:37 AM:
--------------------------------------------------------------

Applied the patch in trunk for now with http://svn.apache.org/r1655996. Once review is done
would merge it to branch

Custom Tika config xml can be now be provided as part of Index Defintion node by creating
a {{nt:file}} node with name {{tikaConfig}} under index definition

{noformat}
/oak:index/assetType
  - jcr:primaryType = "oak:QueryIndexDefinition"
  - compatVersion = 2
  - type = "lucene"
  - async = "async"
  + tika
     + config.xml  (nt:file)
        + jcr:content
           - jcr:data = //config xml binary content
  + indexRules
{noformat}


was (Author: chetanm):
Applied the patch in trunk for now with http://svn.apache.org/r1655996. Once review is done
would merge it to branch

Custom Tika config xml can be now be provided as part of Index Defintion node by creating
a {{nt:file}} node with name {{tikaConfig}} under index definition

{noformat}
/oak:index/assetType
  - jcr:primaryType = "oak:QueryIndexDefinition"
  - compatVersion = 2
  - type = "lucene"
  - async = "async"
  + tika
     + config  (nt:file)
        + jcr:content
           - jcr:data = //config xml binary content
  + indexRules
{noformat}

> Provide support for providing custom Tika config
> ------------------------------------------------
>
>                 Key: OAK-2463
>                 URL: https://issues.apache.org/jira/browse/OAK-2463
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: oak-lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.1.6, 1.0.12
>
>         Attachments: OAK-2463.patch
>
>
> Currently the Oak Lucene uses the default Tika Config while extracting text content from
binary properties. To provide better control the tika config should be made configurable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message