tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
Date Thu, 22 Jun 2017 00:26:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058512#comment-16058512
] 

ASF GitHub Bot commented on TIKA-2262:
--------------------------------------

thammegowda commented on a change in pull request #180: Fix for TIKA-2262: Supporting Image-to-Text
(Image Captioning) in Tika
URL: https://github.com/apache/tika/pull/180#discussion_r123396694
 
 

 ##########
 File path: tika-parsers/src/test/java/org/apache/tika/parser/recognition/ObjectRecognitionParserTest.java
 ##########
 @@ -68,22 +78,49 @@ public void jpegTesorflowTest() throws IOException, TikaException, SAXException
 
     @Ignore("Configure Rest API service")
     @Test
-    public void testREST() throws Exception {
-        try (InputStream stream = loader.getResourceAsStream(CONFIG_REST_FILE)){
+    public void jpegRESTObjRecTest() throws Exception {
+        String[] expectedObjects = {"Egyptian cat", "tabby, tabby cat"};
+        doRecognize(CONFIG_REST_FILE_OBJ_REC, CAT_IMAGE_JPEG,
+                ObjectRecognitionParser.MD_KEY_OBJ_REC, expectedObjects);
+
+
+    }
+
+    @Ignore("Configure Rest API service")
+    @Test
+    public void jpegRESTim2txtTest() throws Exception {
+        String[] expectedObjects = {"baseball", "bat", "field"};
+        doRecognize(CONFIG_REST_FILE_IM2TXT, BASEBALL_IMAGE_JPEG,
+                ObjectRecognitionParser.MD_KEY_IMG_CAP, expectedObjects);
+
+    }
+
+    @Ignore("Configure Rest API service")
+    @Test
+    public void pngRESTim2txtTest() throws Exception {
+        String[] expectedObjects = {"Egyptian cat", "tabby, tabby cat"};
+        doRecognize(CONFIG_REST_FILE_IM2TXT, BASEBALL_IMAGE_PNG,
 
 Review comment:
   Looks like this test is a copy-paste of a previous test.
   
   This one needs more care to get it working!
   1. ` expectedObjects` are wrong assertion. `BASEBALL_IMAGE_PNG` doesnt have cat!
   2. Server crashes when I pass PNG
   ```python
   [2017-06-22 00:17:44,007] ERROR in app: Exception on /inception/v3/captions [POST]
   Traceback (most recent call last):
     File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1982, in wsgi_app
       response = self.full_dispatch_request()
     File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1614, in full_dispatch_request
       rv = self.handle_user_exception(e)
     File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1517, in handle_user_exception
       reraise(exc_type, exc_value, tb)
     File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1612, in full_dispatch_request
       rv = self.dispatch_request()
     File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1598, in dispatch_request
       return self.view_functions[rule.endpoint](**req.view_args)
     File "/usr/bin/im2txtapi", line 217, in gen_captions
       captions = generator.beam_search(app.sess, image_data)
     File "/usr/share/apache-tika/src/dl/image/caption/caption_generator.py", line 126, in
beam_search
       initial_state = self.model.feed_image(sess, encoded_image)
     File "/usr/share/apache-tika/src/dl/image/caption/model_wrapper.py", line 65, in feed_image
       feed_dict={"image_feed:0": encoded_image})
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line
767, in run
       run_metadata_ptr)
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line
965, in _run
       feed_dict_string, options, run_metadata)
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line
1015, in _do_run
       target_list, options, run_metadata)
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line
1035, in _do_call
       raise type(e)(node_def, op, message)
   InvalidArgumentError: Invalid JPEG data, size 263301
   	 [[Node: decode/DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=3, dct_method="",
fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_image_feed_0)]]
   
   Caused by op u'decode/DecodeJpeg', defined at:
     File "/usr/bin/im2txtapi", line 131, in <module>
       app = Initializer(__name__)
     File "/usr/bin/im2txtapi", line 108, in __init__
       restore_fn = model.build_graph(FLAGS.checkpoint_path)
     File "/usr/share/apache-tika/src/dl/image/caption/model_wrapper.py", line 42, in build_graph
       ShowAndTellModel().build()
     File "/usr/share/apache-tika/src/dl/image/caption/model_wrapper.py", line 343, in build
       self.build_inputs()
     File "/usr/share/apache-tika/src/dl/image/caption/model_wrapper.py", line 192, in build_inputs
       images = tf.expand_dims(self.process_image(image_feed), 0)
     File "/usr/share/apache-tika/src/dl/image/caption/model_wrapper.py", line 156, in process_image
       image = tf.image.decode_jpeg(encoded_image, channels=3)
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_image_ops.py",
line 345, in decode_jpeg
       dct_method=dct_method, name=name)
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py",
line 763, in apply_op
       op_def=op_def)
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line
2327, in create_op
       original_op=self._default_original_op, op_def=op_def)
     File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line
1226, in __init__
       self._traceback = _extract_stack()
   
   InvalidArgumentError (see above for traceback): Invalid JPEG data, size 263301
   	 [[Node: decode/DecodeJpeg = DecodeJpeg[acceptable_fraction=1, channels=3, dct_method="",
fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_image_feed_0)]]
   
   ```
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
> ------------------------------------------------------------------------
>
>                 Key: TIKA-2262
>                 URL: https://issues.apache.org/jira/browse/TIKA-2262
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Thamme Gowda
>            Assignee: Thamme Gowda
>              Labels: deeplearning, gsoc2017, machine_learning
>
> h2. Background:
> Image captions are a small piece of text, usually of one line, added to the metadata
of images to provide a brief summary of the scenery in the image. 
> It is a challenging and interesting problem in the domain of computer vision. Tika already
has a support for image recognition via [Object Recognition Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993]
which uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
> Captioning an image is a very useful feature since it helps text based Information Retrieval(IR)
systems to "understand" the scenery in images.
> h2. Technical details and references:
> * Google has long back open sourced their 'show and tell' neural network and its model
for autogenerating captions. [Source Code| https://github.com/tensorflow/models/tree/master/im2txt],
[Research blog| https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
> * Integrate it the same way as the ObjectRecognitionParser
> ** Create a RESTful API Service [similar to this| https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server]

> ** Extend or enhance ObjectRecognitionParser or one of its implementation
> h2. {skills, learning, homework} for GSoC students
> * Knowledge of languages: java AND python, and maven build system
> * RESTful APIs 
> * tensorflow/keras,
> * deeplearning
> ----
> Alternatively, a little more harder path for experienced:
> [Import keras/tensorflow model to deeplearning4j|https://deeplearning4j.org/model-import-keras
] and run them natively inside JVM.
> h4. Benefits
> * no RESTful integration required. thus no external dependencies
> * easy to distribute on hadoop/spark clusters
> h4. Hurdles:
> * This is a work in progress feature on deeplearning4j and hence expected to have lots
of troubles on the way! 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message