tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Palsulich (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TIKA-1558) Create a Parser Blacklist
Date Sun, 22 Feb 2015 00:54:11 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14331975#comment-14331975
] 

Tyler Palsulich edited comment on TIKA-1558 at 2/22/15 12:53 AM:
-----------------------------------------------------------------

This has the added benefit of working for any Tika service -- Translator, Parser, etc. And,
regardless of how/where the services are loaded, the blacklist is applied. In order for the
blacklistlist to apply in all cases for TIKA-1509, the ServiceLoader would need to be passed
the TikaConfig with the Parser (or whatever service?) strategy. Unless I'm missing something
(very well could be!)

I thought of TIKA-1509 as configuration when multiple Parsers are available. But, it could
definitely apply as a blacklist feature. I'm happy to iterate. :)

Edit: I should note that if a user blacklists a Parser, we shouldn't even check which types
it supports (since, for example, the OCR (problem child) Parser forks a process when "deciding"
if it supports any types). That's why I opted to remove the blacklisted Parser as soon as
we know we should.


was (Author: tpalsulich):
This has the added benefit of working for any Tika service -- Translator, Parser, etc. And,
regardless of how/where the services are loaded, the blacklist is applied. In order for the
blacklistlist to apply in all cases for TIKA-1509, the ServiceLoader would need to be passed
the TikaConfig with the Parser (or whatever service?) strategy. Unless I'm missing something
(very well could be!)

I thought of TIKA-1509 as configuration when multiple Parsers are available. But, it could
definitely apply as a blacklist feature. I'm happy to iterate. :)

> Create a Parser Blacklist
> -------------------------
>
>                 Key: TIKA-1558
>                 URL: https://issues.apache.org/jira/browse/TIKA-1558
>             Project: Tika
>          Issue Type: New Feature
>            Reporter: Tyler Palsulich
>            Assignee: Tyler Palsulich
>             Fix For: 1.8
>
>
> As talked about in TIKA-1555 and TIKA-1557, it would be nice to be able to disable Parsers
without pulling their dependencies out. In some cases (e.g. disable all ExternalParsers),
there may not be an easy way to exclude the dependencies via Maven.
> So, an initial design would be to include another file like {{META-INF/services/org.apache.tika.parser.Parser.blacklist}}.
We create a new method {{ServiceLoader#loadServiceProviderBlacklist}}. Then, in {{ServiceLoader#loadServiceProviders}},
we remove all elements of the list that are assignable to an element in {{ServiceLoader#loadServiceProviderBlacklist}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message