tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TIKA-2360) Handle SentimentParser resource failure more robustly
Date Mon, 15 May 2017 15:22:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010631#comment-16010631
] 

Tim Allison edited comment on TIKA-2360 at 5/15/17 3:21 PM:
------------------------------------------------------------

My preference would be not to include the SentimentParser by default:

1) network calls that are not currently robustly handled

2) .sent glob in mime detection which could cause problems for users who happen to have files
with that suffix, and y, I can't imagine users have a bunch of Apple II files kicking around,
but this is a mildly worrisome method of triggering the SentimentParser

3) while very cool, it is a fundamentally different thing than a parser.  It enriches already
extracted UTF-8 text, kind of like the phone number handler, etc.  I realize NER does exactly
the same thing...I know...

My proposal is that we treat the SentimentParser the same way we do NER.  Remove it from SPI,
remove glob detection, swallow but log exceptions on initialization.

[~chrismattmann] and others, any objections? 


was (Author: tallison@mitre.org):
My preference would be not to include the SentimentParser by default:

1) network calls that are not currently robustly handled

2) .sent glob in mime detection which could cause problems for users who happen to have files
with that suffix, and y, I can't imagine users have a bunch of Apple II files kicking around,
but this is a mildly worrisome method of triggering the SentimentParser

3) while very cool, it is a fundamentally different thing than a parser.  It enriches already
extracted UTF-8 text, kind of like the phone number handler, etc.  I realize NER does exactly
the same thing...I know...



> Handle SentimentParser resource failure more robustly
> -----------------------------------------------------
>
>                 Key: TIKA-2360
>                 URL: https://issues.apache.org/jira/browse/TIKA-2360
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Blocker
>
> The SentimentParser tests currently require a network call to github.  For those working
behind a proxy or would prefer Tika not to make unexpected network calls, can we please turn
this off by default?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message