tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
Date Thu, 15 Jun 2017 16:20:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050707#comment-16050707
] 

ASF GitHub Bot commented on TIKA-2262:
--------------------------------------

ThejanW commented on a change in pull request #180: Fix for TIKA-2262: Supporting Image-to-Text
(Image Captioning) in Tika
URL: https://github.com/apache/tika/pull/180#discussion_r122246914
 
 

 ##########
 File path: tika-parsers/src/main/java/org/apache/tika/parser/captioning/tf/TensorflowRESTCaptioner.java
 ##########
 @@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.parser.captioning.tf;
+
+import org.apache.http.HttpResponse;
+import org.apache.http.client.methods.HttpGet;
+import org.apache.http.client.methods.HttpPost;
+import org.apache.http.entity.ByteArrayEntity;
+import org.apache.http.impl.client.DefaultHttpClient;
+import org.apache.tika.config.Field;
+import org.apache.tika.config.Param;
+import org.apache.tika.exception.TikaConfigException;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.IOUtils;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.mime.MediaType;
+import org.apache.tika.parser.ParseContext;
+import org.apache.tika.parser.recognition.ObjectRecogniser;
+import org.apache.tika.parser.recognition.RecognisedObject;
+import org.apache.tika.parser.captioning.CaptionObject;
+import org.apache.uima.tools.util.gui.Caption;
+import org.json.JSONArray;
+import org.json.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.xml.sax.ContentHandler;
+import org.xml.sax.SAXException;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Arrays;
+
+/**
+ * Tensorflow image captioner.
+ * This implementation uses Tensorflow via REST API.
+ * <p>
+ * NOTE : //TODO: link to wiki page here
+ *
+ * @since Apache Tika 1.16
+ */
+public class TensorflowRESTCaptioner implements ObjectRecogniser {
+    private static final Logger LOG = LoggerFactory.getLogger(TensorflowRESTCaptioner.class);
+
+    private static final Set<MediaType> SUPPORTED_MIMES = Collections.unmodifiableSet(
+            new HashSet<>(Arrays.asList(new MediaType[]{
+                    MediaType.image("png"), MediaType.image("jpeg")
+            })));
+
+    private static final String LABEL_LANG = "en";
+
+    @Field
+    private int captions;
+
+    @Field
+    private int maxCaptionLength;
+
+    @Field
+    private URI apiUri;
+
+    @Field
+    private URI healthUri = URI.create("http://localhost:8764/inception/v3/ping");
+
+    private boolean available;
+
+    protected URI getApiUri(Metadata metadata) {
+        return apiUri;
+    }
+
+    @Override
+    public Set<MediaType> getSupportedMimes() {
+        return SUPPORTED_MIMES;
+    }
+
+    @Override
+    public boolean isAvailable() {
+        return available;
+    }
+
+    @Override
+    public void initialize(Map<String, Param> params) throws TikaConfigException {
+        try {
+            DefaultHttpClient client = new DefaultHttpClient();
+            HttpResponse response = client.execute(new HttpGet(healthUri));
+            available = response.getStatusLine().getStatusCode() == 200;
+            LOG.info("Available = {}, API Status = {}", available, response.getStatusLine());
+            apiUri = URI.create(String.format(
+                    "http://localhost:8764/inception/v3/captions?beam_size=%1$d&max_caption_length=%2$d",
 
 Review comment:
   Yes. That sounds good. How about moving the apiBaseUri into config.xml?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types
> ------------------------------------------------------------------------
>
>                 Key: TIKA-2262
>                 URL: https://issues.apache.org/jira/browse/TIKA-2262
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Thamme Gowda
>            Assignee: Thamme Gowda
>              Labels: deeplearning, gsoc2017, machine_learning
>
> h2. Background:
> Image captions are a small piece of text, usually of one line, added to the metadata
of images to provide a brief summary of the scenery in the image. 
> It is a challenging and interesting problem in the domain of computer vision. Tika already
has a support for image recognition via [Object Recognition Parser, TIKA-1993| https://issues.apache.org/jira/browse/TIKA-1993]
which uses an InceptionV3 model pre-trained on ImageNet dataset using tensorflow. 
> Captioning an image is a very useful feature since it helps text based Information Retrieval(IR)
systems to "understand" the scenery in images.
> h2. Technical details and references:
> * Google has long back open sourced their 'show and tell' neural network and its model
for autogenerating captions. [Source Code| https://github.com/tensorflow/models/tree/master/im2txt],
[Research blog| https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html]
> * Integrate it the same way as the ObjectRecognitionParser
> ** Create a RESTful API Service [similar to this| https://wiki.apache.org/tika/TikaAndVision#A2._Tensorflow_Using_REST_Server]

> ** Extend or enhance ObjectRecognitionParser or one of its implementation
> h2. {skills, learning, homework} for GSoC students
> * Knowledge of languages: java AND python, and maven build system
> * RESTful APIs 
> * tensorflow/keras,
> * deeplearning
> ----
> Alternatively, a little more harder path for experienced:
> [Import keras/tensorflow model to deeplearning4j|https://deeplearning4j.org/model-import-keras
] and run them natively inside JVM.
> h4. Benefits
> * no RESTful integration required. thus no external dependencies
> * easy to distribute on hadoop/spark clusters
> h4. Hurdles:
> * This is a work in progress feature on deeplearning4j and hence expected to have lots
of troubles on the way! 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message