tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <mattm...@apache.org>
Subject Re: Can't get Tensorflow REST recognizer to work
Date Sun, 14 Aug 2016 18:03:35 GMT
More info, here’s what –J produces:

LMC-053601:tika1.14 mattmann$ java -cp tika-app/target/tika-app-1.14-SNAPSHOT.jar org.apache.tika.cli.TikaCLI
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml
-J tika-parsers/src/test/resources/test-documents/testJPEG.jpg
INFO  Available = true, API Status = HTTP/1.0 200 OK
INFO  minConfidence = 0.015, topN=7
INFO  Recogniser = org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
INFO  Recogniser Available = true
Exception in thread "main" org.xml.sax.SAXException: Namespace http://www.w3.org/1999/xhtml
not declared
at org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getPrefix(ToXMLContentHandler.java:62)
at org.apache.tika.sax.ToXMLContentHandler$ElementInfo.getQName(ToXMLContentHandler.java:68)
at org.apache.tika.sax.ToXMLContentHandler.startElement(ToXMLContentHandler.java:148)
at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
at org.apache.tika.sax.SecureContentHandler.startElement(SecureContentHandler.java:250)
at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
at org.apache.tika.sax.ContentHandlerDecorator.startElement(ContentHandlerDecorator.java:126)
at org.apache.tika.sax.SafeContentHandler.startElement(SafeContentHandler.java:264)
at org.apache.tika.sax.XHTMLContentHandler.lazyStartHead(XHTMLContentHandler.java:140)
at org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:158)
at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:247)
at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:291)
at org.apache.tika.parser.recognition.ObjectRecognitionParser.parse(ObjectRecognitionParser.java:125)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:188)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
at org.apache.tika.cli.TikaCLI.handleRecursiveJson(TikaCLI.java:500)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:475)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:145)
LMC-053601:tika1.14 mattmann$



On 8/14/16, 10:15 AM, "Chris Mattmann" <mattmann@apache.org> wrote:

    Hi Devs,
    
    Here’s what I’m seeing in TIKA-1993 and 1508, which I would love to finish today.
    
    1. Tensorflow python script works great.
    2. Tensorflow REST service – Docker container works (had to upgrade Docker to latest)
    3. Tensorflow REST service – Tika parser metadata works great.
    4. Tensorflow REST service – Tika XHTML won’t print or work.
    
    I can’t get the XHTML to print with the tika app –x flag (though –m produces the
following):
    
    LMC-053601:tika1.14 mattmann$ java -cp tika-app/target/tika-app-1.14-SNAPSHOT.jar org.apache.tika.cli.TikaCLI
--config=tika-parsers/src/test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml
-m tika-parsers/src/test/resources/test-documents/testJPEG.jpg
    INFO  Available = true, API Status = HTTP/1.0 200 OK
    INFO  minConfidence = 0.015, topN=7
    INFO  Recogniser = org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
    INFO  Recogniser Available = true
    Content-Length: 7686
    Content-Type: image/jpeg
    OBJECT: Egyptian cat (0.09168)
    OBJECT: Border collie (0.07553)
    OBJECT: bluetick (0.06043)
    OBJECT: collie (0.02982)
    OBJECT: English foxhound (0.02759)
    OBJECT: Siamese cat, Siamese (0.02053)
    OBJECT: tabby, tabby cat (0.01826)
    X-Parsed-By: org.apache.tika.parser.CompositeParser
    X-Parsed-By: org.apache.tika.parser.recognition.ObjectRecognitionParser
    org.apache.tika.parser.recognition.object.rec.impl: org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
    resourceName: testJPEG.jpg
    LMC-053601:tika1.14 mattmann$ 
    
    Thoughts? @Thamme?
    
    Cheers,
    Chris
    
    
    
    



Mime
View raw message