tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2360) Handle SentimentParser resource failure more robustly
Date Tue, 16 May 2017 20:49:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013087#comment-16013087

Tim Allison commented on TIKA-2360:

> Thanks Tim, appreciate it.
Of course!  I'm sorry for moving out on this without giving enough time for feedback!

To my mind, 1. would be great.

For 2., I'm happy to leave the SentimentParser as a parser for Tika 1.x as long as users are
required to turn it on.  Or, is this a sticking point for you?

For Tika 2.0 we should come up with an interface/common way of handling post-processing after
"text" has been extracted.  We currently have the NER parser and the Sentiment parser that
require text, but we've also put this post-processing functionality into handlers for other
things -- the old Language id handler and the phone # extractor.

As for the ObjectRecogniser, I think we might want to consider turning that into a Parser
(at some point) because it handles raw bytes, just like OCR or the JPEG parser.  The output
could populate Metadata instead of returning a list of recognized objects...however, I realize,
here, we get back into the challenge of arbitrary metadata (TIKA-1607)...because we do want
to group the object bits together for each object.  In Tika 2.x, this would allow users to
configure a composite image parser composed of three parsers: metadata extraction, OCR and
image recognition, and y, it might take 2 minutes per image, but the capability would be there...

> Handle SentimentParser resource failure more robustly
> -----------------------------------------------------
>                 Key: TIKA-2360
>                 URL: https://issues.apache.org/jira/browse/TIKA-2360
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Blocker
>             Fix For: 1.15
> The SentimentParser tests currently require a network call to github.  For those working
behind a proxy or would prefer Tika not to make unexpected network calls, can we please turn
this off by default?

This message was sent by Atlassian JIRA

View raw message