tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Giuseppe Totaro (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11
Date Tue, 22 Sep 2015 18:08:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903123#comment-14903123

Giuseppe Totaro commented on TIKA-1739:

Hi [~chrismattmann], Hi [~gagravarr],
I looked at the last code of {{CTAKESParser.java}} and I did some experiments on my laptop.
Basically, the problem is due to the default constructor of {{CTAKESParser.java}}:
 * Wraps the default Parser
public CTAKESParser() {

To use CTAKESParser, we need to create a specific configuration for CTAKESParser (unless we
aim at using the parser programmatically), as reported in [ctakesparser-utils|https://github.com/chrismattmann/ctakesparser-utils]
While parsing, the default constructor of CTAKESParser is used by Tika overriding the given
configuration at runtime. Therefore, CTAKESParser is only "visited" by Tika that will use,
instead, the EmptyParser as fallback.

For instance, if we use again the previous default constructor (that does not override the
given configuration), then we can use properly cTAKES and obtain the right metadata:
public CTAKESParser() {
    super(new AutoDetectParser());

[~chrismattmann] and [~gagravarr]], I will be really gald to hear your feedback.
Thanks a lot,

> cTAKESParser doesn't work in 1.11
> ---------------------------------
>                 Key: TIKA-1739
>                 URL: https://issues.apache.org/jira/browse/TIKA-1739
>             Project: Tika
>          Issue Type: Bug
>          Components: parser, server
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.11
> Tika cTAKESParser integration doesn't work in 1.11. The parser is called, but blank metadata
comes back:
> {noformat}
> curl -T test.txt -H "Content-Type: text/plain" http://localhost:9999/rmeta/text
> [{"Content-Type":"text/plain","X-Parsed-By":["org.apache.tika.parser.CompositeParser","org.apache.tika.parser.ctakes.CTAKESParser","org.apache.tika.parser.EmptyParser"],"X-TIKA:parse_time_millis":"20371","ctakes:schema":"coveredText:start:end:ontologyConceptArr"}
> {noformat}
> [~gagravarr] I wonder if something that happened in TIKA-1653 broke it?
> http://svn.apache.org/viewvc?view=revision&revision=1684199
> [~gostep] can you help me look here?
> I'm working on https://github.com/chrismattmann/shangridocs/tree/convert-wicket which
is where I first saw this.

This message was sent by Atlassian JIRA

View raw message