tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thamme Gowda <tgow...@gmail.com>
Subject Re: Can't get Tensorflow REST recognizer to work
Date Sun, 14 Aug 2016 18:50:37 GMT
Got it prof.

Lessons learned for the next parsers.

Thanks,

~
Thamme
--
*Thamme Gowda *
Grad Student at USC <http://usc.edu>
@thammegowda <https://twitter.com/thammegowda> | 213-536-3552
http://scf.usc.edu/~tnarayan/

2016-08-14 11:29 GMT-07:00 Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov>:

> Fixed!
>
> finally fixed it! 2 issues:
>
> Needed startDocument and endDocument in the handler - that fixed the JSON
> and in turn ended up fixing the REST and script based Tensorflow calls.
> The often come up (but still undocumented we need to fix that!) problem
> that you can't concurrently mess with the metadata object whilst doing the
> ContentHandler stuff. You have to have an ImmutableMetadata object by the
> time you do ContentHandler stuff.
> I'm going to do a few more tests then get this committed! Great work
> @thammegowda. Overall this is an amazing contribution it will be awesome
> for Tika users!
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect, Instrument Software and Science Data Systems Section (398)
> Manager, Open Source Projects Formulation and Development Office (8212)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> On 8/14/16, 10:15 AM, "Chris Mattmann" <mattmann@apache.org> wrote:
>
>     Hi Devs,
>
>     Here’s what I’m seeing in TIKA-1993 and 1508, which I would love to
> finish today.
>
>     1. Tensorflow python script works great.
>     2. Tensorflow REST service – Docker container works (had to upgrade
> Docker to latest)
>     3. Tensorflow REST service – Tika parser metadata works great.
>     4. Tensorflow REST service – Tika XHTML won’t print or work.
>
>     I can’t get the XHTML to print with the tika app –x flag (though –m
> produces the following):
>
>     LMC-053601:tika1.14 mattmann$ java -cp tika-app/target/tika-app-1.14-SNAPSHOT.jar
> org.apache.tika.cli.TikaCLI --config=tika-parsers/src/
> test/resources/org/apache/tika/parser/recognition/tika-config-tflow-rest.xml
> -m tika-parsers/src/test/resources/test-documents/testJPEG.jpg
>     INFO  Available = true, API Status = HTTP/1.0 200 OK
>     INFO  minConfidence = 0.015, topN=7
>     INFO  Recogniser = org.apache.tika.parser.recognition.tf.
> TensorflowRESTRecogniser
>     INFO  Recogniser Available = true
>     Content-Length: 7686
>     Content-Type: image/jpeg
>     OBJECT: Egyptian cat (0.09168)
>     OBJECT: Border collie (0.07553)
>     OBJECT: bluetick (0.06043)
>     OBJECT: collie (0.02982)
>     OBJECT: English foxhound (0.02759)
>     OBJECT: Siamese cat, Siamese (0.02053)
>     OBJECT: tabby, tabby cat (0.01826)
>     X-Parsed-By: org.apache.tika.parser.CompositeParser
>     X-Parsed-By: org.apache.tika.parser.recognition.
> ObjectRecognitionParser
>     org.apache.tika.parser.recognition.object.rec.impl:
> org.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser
>     resourceName: testJPEG.jpg
>     LMC-053601:tika1.14 mattmann$
>
>     Thoughts? @Thamme?
>
>     Cheers,
>     Chris
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message