tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1739) cTAKESParser doesn't work in 1.11
Date Tue, 22 Sep 2015 19:14:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903252#comment-14903252
] 

Chris A. Mattmann commented on TIKA-1739:
-----------------------------------------

OK [~totaro] I implemented your solution (see attached patch). I am still getting the same
results:

# Server Side
{noformat}
[chipotle:~/src/tika-server] mattmann% sh start-ctakes-tika.sh
log4j: reset attribute= "false".
log4j: Threshold ="null".
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressAppender] additivity to [false].
log4j: Level value for ProgressAppender is  [INFO].
log4j: ProgressAppender level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m].
log4j: Adding appender named [noEolAppender] to category [ProgressAppender].
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressDone] additivity to [false].
log4j: Level value for ProgressDone is  [INFO].
log4j: ProgressDone level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m%n].
log4j: Adding appender named [eolAppender] to category [ProgressDone].
log4j: Level value for root is  [INFO].
log4j: root level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss} %5p %c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
22 Sep 2015 12:09:29  INFO TikaServerCli - Starting Apache Tika 1.11-SNAPSHOT server
22 Sep 2015 12:09:29  INFO TikaServerCli - Using custom config: /Users/mattmann/git/ctakesparser-utils/config/tika-config.xml
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/mattmann/src/tika-server/target/tika-server-1.11-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/apache-ctakes-3.2.2/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JCLLoggerFactory]
Sep 22, 2015 12:09:30 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://localhost:9999/
22 Sep 2015 12:09:30  INFO Server - jetty-8.y.z-SNAPSHOT
22 Sep 2015 12:09:30  INFO AbstractConnector - Started SelectChannelConnector@localhost:9999
22 Sep 2015 12:09:30  INFO TikaServerCli - Started
22 Sep 2015 12:09:35  INFO RecursiveMetadataResource - rmeta/text (application/pdf)
22 Sep 2015 12:09:36  INFO ClearNLPDependencyParserAE - using Morphy analysis? true
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
........................................................................................
22 Sep 2015 12:09:50  INFO TokenizerAnnotatorPTB - Initializing org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
22 Sep 2015 12:09:50  INFO ContextDependentTokenizerAnnotator - Finite state machines loaded.
22 Sep 2015 12:09:50  INFO ConstituencyParser - Initializing parser...
22 Sep 2015 12:09:53  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
22 Sep 2015 12:09:53  INFO StatusContextAnalyzer - initBoundaryData() called for ContextInitializer
22 Sep 2015 12:09:53  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
22 Sep 2015 12:09:53  INFO NegationContextAnalyzer - initBoundaryData() called for ContextInitializer
22 Sep 2015 12:09:54  INFO SentenceDetector - Sentence detector model file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
22 Sep 2015 12:09:56  INFO POSTagger - POS tagger model file: org/apache/ctakes/postagger/models/mayo-pos.zip
Loading configuration.
Loading feature templates.
Loading model:
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.......
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
..........
Loading model:
.
Loading model:
...
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
....
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.....
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
........
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
....
Loading model:
.
Loading model:
........
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
....
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.....
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
..............
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
....
Loading model:
....
Loading model:
.
Loading model:
....
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
....
Loading model:
.......
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
....
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
....
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.....
Loading model:
......
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
....
Loading model:
.
Loading model:
.
Loading model:
....
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
....
Loading model:
...
Loading model:
.
Loading model:
....
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading model:
.......
Loading model:
.
Loading model:
.
Loading model:
....
Loading model:
.
Loading model:
.
Loading model:
.......
Loading model:
.
Loading model:
.
Loading model:
.
Loading model:
...
Loading model:
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
................................
Loading model:
.............................
22 Sep 2015 12:10:00  INFO Chunker - Chunker model file: org/apache/ctakes/chunker/models/chunker-model.zip
22 Sep 2015 12:10:03  INFO JdbcConnectionResourceImpl - Connection established to: jdbc:hsqldb:res:org/apache/ctakes/dictionary/lookup/umls2011ab/umls
22 Sep 2015 12:10:03  INFO JdbcConnectionResourceImpl - Connection established to: jdbc:hsqldb:res:org/apache/ctakes/dictionary/lookup/rxnorm-hsqldb/umls
22 Sep 2015 12:10:03  INFO JdbcConnectionResourceImpl - Connection established to: jdbc:hsqldb:res:org/apache/ctakes/dictionary/lookup/orange_book_hsqldb/umls
22 Sep 2015 12:10:03  INFO UmlsDictionaryLookupAnnotator - Parsing descriptor: /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/dictionary/lookup/LookupDesc_Db.xml
22 Sep 2015 12:10:03  INFO FirstTokenPermLookupInitializerImpl - Exclusion tagset loaded:
[dt, to, rp, ls, pos, md, vbd, vbg, vb, ex, vbp, vbn, pdt, vbz, wp, wrb, in, wps, pp$, prp$,
wdt, prp, pp, cc, cd]
22 Sep 2015 12:10:03  INFO FirstTokenPermLookupInitializerImpl - Exclusion tagset loaded:
[to, dt, rp, ex, vbp, ls, vbn, pdt, wp, vbz, wrb, in, pos, wps, md, wdt, pp$, vbd, vb, vbg,
pp, cc, cd]
22 Sep 2015 12:10:03  INFO UmlsDictionaryLookupAnnotator - Using ctakes.umlsaddr: https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser:
chrismattmann
22 Sep 2015 12:10:05  INFO LvgCmdApiResourceImpl - Loading NLM Norm and Lvg with config file
= /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/lvg/data/config/lvg.properties
22 Sep 2015 12:10:05  INFO LvgCmdApiResourceImpl -   config file absolute path = /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/lvg/data/config/lvg.properties
22 Sep 2015 12:10:05  INFO LvgCmdApiResourceImpl - cwd = /Users/mattmann/src/tika-server
22 Sep 2015 12:10:05  INFO LvgCmdApiResourceImpl - cd /usr/local/apache-ctakes-3.2.2/resources/org/apache/ctakes/lvg/
22 Sep 2015 12:10:06  INFO LvgCmdApiResourceImpl - cd /Users/mattmann/src/tika-server
22 Sep 2015 12:10:06  INFO SentenceDetector - Starting processing.
22 Sep 2015 12:10:06  INFO TokenizerAnnotatorPTB - process(JCas) in org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
22 Sep 2015 12:10:06  INFO LvgAnnotator - process(JCas)
22 Sep 2015 12:10:06  INFO ContextDependentTokenizerAnnotator - process(JCas)
22 Sep 2015 12:10:06  INFO POSTagger - process(JCas)
22 Sep 2015 12:10:06  INFO Chunker -  process(JCas)
22 Sep 2015 12:10:06  INFO ChunkAdjuster -  process(JCas)
22 Sep 2015 12:10:06  INFO ChunkAdjuster -  process(JCas)
22 Sep 2015 12:10:06  INFO CopyAnnotator - process(JCas)
22 Sep 2015 12:10:06  INFO OverlapAnnotator - process(JCas)
22 Sep 2015 12:10:06  INFO UmlsDictionaryLookupAnnotator - process(JCas)
22 Sep 2015 12:10:06  INFO MaxentParserWrapper - Started processing: null
22 Sep 2015 12:10:06  INFO MaxentParserWrapper - Done parsing: null
{noformat}

## Client Side
{noformat}
% curl -T $HOME/Desktop/BigDataRTD/Celgene/Vose-2013-American_Journal_of_Hematology.pdf -H
"Content-Disposition: attachment; filename=Vose-2013-American_Journal_of_Hematology.pdf" http://localhost:9999/rmeta/text
-H "Content-Type: application/pdf"
[{"Content-Type":"application/pdf","X-Parsed-By":["org.apache.tika.parser.CompositeParser","org.apache.tika.parser.ctakes.CTAKESParser","org.apache.tika.parser.EmptyParser"],"X-TIKA:parse_time_millis":"31852","ctakes:schema":"coveredText:start:end:ontologyConceptArr","resourceName":"Vose-2013-American_Journal_of_Hematology.pdf"}]
{noformat}


> cTAKESParser doesn't work in 1.11
> ---------------------------------
>
>                 Key: TIKA-1739
>                 URL: https://issues.apache.org/jira/browse/TIKA-1739
>             Project: Tika
>          Issue Type: Bug
>          Components: parser, server
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.11
>
>         Attachments: TIKA-1739.patch
>
>
> Tika cTAKESParser integration doesn't work in 1.11. The parser is called, but blank metadata
comes back:
> {noformat}
> curl -T test.txt -H "Content-Type: text/plain" http://localhost:9999/rmeta/text
> [{"Content-Type":"text/plain","X-Parsed-By":["org.apache.tika.parser.CompositeParser","org.apache.tika.parser.ctakes.CTAKESParser","org.apache.tika.parser.EmptyParser"],"X-TIKA:parse_time_millis":"20371","ctakes:schema":"coveredText:start:end:ontologyConceptArr"}
> {noformat}
> [~gagravarr] I wonder if something that happened in TIKA-1653 broke it?
> http://svn.apache.org/viewvc?view=revision&revision=1684199
> [~gostep] can you help me look here?
> I'm working on https://github.com/chrismattmann/shangridocs/tree/convert-wicket which
is where I first saw this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message