tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11
Date Tue, 22 Sep 2015 23:45:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903670#comment-14903670
] 

Chris A. Mattmann commented on TIKA-1739:
-----------------------------------------

So, I'm going to take this to the list, but here is the use case:

cTAKESParser should be a parser decorator, which decorates the AutoDetectParser. IOW, it lets
it do its job, then after parse, and calling decorator.parse() it then should be allowed to
do its thing by adding in the biomedical metadata knowledge to it.

So this is why we set it up before as a decorator. I haven't looked at the code to figure
out why I need to put DefaultParser as a sub-parser of cTAKESParser in the config. This is
a change in behavior as to the way we implemented it before. Anyways, thanks to you Nick and
your suggested update it's working now so I am going to close this one off. I also updated
the docs here:

https://wiki.apache.org/tika/cTAKESParser

And also updated:

https://github.com/chrismattmann/ctakesparser-utils/

> cTAKESParser doesn't work in 1.11
> ---------------------------------
>
>                 Key: TIKA-1739
>                 URL: https://issues.apache.org/jira/browse/TIKA-1739
>             Project: Tika
>          Issue Type: Bug
>          Components: parser, server
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.11
>
>         Attachments: TIKA-1739.patch
>
>
> Tika cTAKESParser integration doesn't work in 1.11. The parser is called, but blank metadata
comes back:
> {noformat}
> curl -T test.txt -H "Content-Type: text/plain" http://localhost:9999/rmeta/text
> [{"Content-Type":"text/plain","X-Parsed-By":["org.apache.tika.parser.CompositeParser","org.apache.tika.parser.ctakes.CTAKESParser","org.apache.tika.parser.EmptyParser"],"X-TIKA:parse_time_millis":"20371","ctakes:schema":"coveredText:start:end:ontologyConceptArr"}
> {noformat}
> [~gagravarr] I wonder if something that happened in TIKA-1653 broke it?
> http://svn.apache.org/viewvc?view=revision&revision=1684199
> [~gostep] can you help me look here?
> I'm working on https://github.com/chrismattmann/shangridocs/tree/convert-wicket which
is where I first saw this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message