tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2360) Handle SentimentParser resource failure more robustly
Date Mon, 15 May 2017 14:49:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010631#comment-16010631

Tim Allison commented on TIKA-2360:

My preference would be not to include the SentimentParser by default:

1) network calls that are not currently robustly handled

2) .sent glob in mime detection which could cause problems for users who happen to have files
with that suffix, and y, I can't imagine users have a bunch of Apple II files kicking around,
but this is a mildly worrisome method of triggering the SentimentParser

3) while very cool, it is a fundamentally different thing than a parser.  It enriches already
extracted UTF-8 text, kind of like the phone number handler, etc.  I realize NER does exactly
the same thing...I know...

> Handle SentimentParser resource failure more robustly
> -----------------------------------------------------
>                 Key: TIKA-2360
>                 URL: https://issues.apache.org/jira/browse/TIKA-2360
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Blocker
> The SentimentParser tests currently require a network call to github.  For those working
behind a proxy or would prefer Tika not to make unexpected network calls, can we please turn
this off by default?

This message was sent by Atlassian JIRA

View raw message