tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser
Date Fri, 14 Apr 2017 04:52:41 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968599#comment-15968599
] 

ASF GitHub Bot commented on TIKA-2306:
--------------------------------------

thammegowda commented on a change in pull request #163: TIKA-2306: Update Inception v3 to
Inception v4 in Object recognition parser 
URL: https://github.com/apache/tika/pull/163#discussion_r111529125
 
 

 ##########
 File path: tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/classify_image.py
 ##########
 @@ -44,169 +49,205 @@
 from six.moves import urllib
 import tensorflow as tf
 
+from datasets import imagenet, dataset_utils
+from nets import inception
+from preprocessing import inception_preprocessing
+
+slim = tf.contrib.slim
+
 FLAGS = tf.app.flags.FLAGS
 
-# classify_image_graph_def.pb:
-#   Binary representation of the GraphDef protocol buffer.
-# imagenet_synset_to_human_label_map.txt:
+# inception_v4.ckpt
+#   Inception V4 checkpoint file.
+# imagenet_metadata.txt
 #   Map from synset ID to a human readable string.
-# imagenet_2012_challenge_label_map_proto.pbtxt:
+# imagenet_lsvrc_2015_synsets.txt
 #   Text representation of a protocol buffer mapping a label to synset ID.
 tf.app.flags.DEFINE_string(
     'model_dir', '/tmp/imagenet',
-    """Path to classify_image_graph_def.pb, """
-    """imagenet_synset_to_human_label_map.txt, and """
-    """imagenet_2012_challenge_label_map_proto.pbtxt.""")
+    """Path to inception_v4.ckpt, """
+    """imagenet_lsvrc_2015_synsets.txt, and """
+    """imagenet_metadata.txt.""")
 tf.app.flags.DEFINE_string('image_file', '',
                            """Absolute path to image file.""")
 tf.app.flags.DEFINE_integer('num_top_predictions', 5,
                             """Display this many predictions.""")
 
 # pylint: disable=line-too-long
-DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
+DATA_URL = 'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz'
 # pylint: enable=line-too-long
 
 
-class NodeLookup(object):
-  """Converts integer node ID's to human readable labels."""
+def create_readable_names_for_imagenet_labels():
+    """Create a dict mapping label id to human readable string.
+
+    Returns:
+        labels_to_names: dictionary where keys are integers from to 1000
+        and values are human-readable names.
+
+    We retrieve a synset file, which contains a list of valid synset labels used
+    by ILSVRC competition. There is one synset one per line, eg.
+            #   n01440764
+            #   n01443537
+    We also retrieve a synset_to_human_file, which contains a mapping from synsets
+    to human-readable names for every synset in Imagenet. These are stored in a
+    tsv format, as follows:
+            #   n02119247    black fox
+            #   n02119359    silver fox
+    We assign each synset (in alphabetical order) an integer, starting from 1
+    (since 0 is reserved for the background class).
+
+    Code is based on
+    https://github.com/tensorflow/models/blob/master/inception/inception/data/build_imagenet_data.py#L463
+    """
+
+    # pylint: disable=line-too-long
+
+    dest_directory = FLAGS.model_dir
+
+    synset_list = [s.strip() for s in open(os.path.join(
+        dest_directory, 'imagenet_lsvrc_2015_synsets.txt')).readlines()]
+    num_synsets_in_ilsvrc = len(synset_list)
+    assert num_synsets_in_ilsvrc == 1000
 
-  def __init__(self,
-               label_lookup_path=None,
-               uid_lookup_path=None):
-    if not label_lookup_path:
-      label_lookup_path = os.path.join(
-          FLAGS.model_dir, 'imagenet_2012_challenge_label_map_proto.pbtxt')
-    if not uid_lookup_path:
-      uid_lookup_path = os.path.join(
-          FLAGS.model_dir, 'imagenet_synset_to_human_label_map.txt')
-    self.node_lookup = self.load(label_lookup_path, uid_lookup_path)
+    synset_to_human_list = open(os.path.join(
+        dest_directory, 'imagenet_metadata.txt')).readlines()
+    num_synsets_in_all_imagenet = len(synset_to_human_list)
+    assert num_synsets_in_all_imagenet == 21842
 
-  def load(self, label_lookup_path, uid_lookup_path):
-    """Loads a human readable English name for each softmax node.
+    synset_to_human = {}
+    for s in synset_to_human_list:
+        parts = s.strip().split('\t')
+        assert len(parts) == 2
+        synset = parts[0]
+        human = parts[1]
+        synset_to_human[synset] = human
+
+    label_index = 1
+    labels_to_names = {0: 'background'}
+    for synset in synset_list:
+        name = synset_to_human[synset]
+        labels_to_names[label_index] = name
+        label_index += 1
+
+    return labels_to_names
+
+
+def run_inference_on_image(image):
+    """Runs inference on an image.
 
     Args:
-      label_lookup_path: string UID to integer node ID.
-      uid_lookup_path: string UID to human-readable string.
+      image: Image file name.
 
     Returns:
-      dict from integer node ID to human-readable string.
+      Nothing
     """
-    if not tf.gfile.Exists(uid_lookup_path):
-      tf.logging.fatal('File does not exist %s', uid_lookup_path)
-    if not tf.gfile.Exists(label_lookup_path):
-      tf.logging.fatal('File does not exist %s', label_lookup_path)
-
-    # Loads mapping from string UID to human-readable string
-    proto_as_ascii_lines = tf.gfile.GFile(uid_lookup_path).readlines()
-    uid_to_human = {}
-    p = re.compile(r'[n\d]*[ \S,]*')
-    for line in proto_as_ascii_lines:
-      parsed_items = p.findall(line)
-      uid = parsed_items[0]
-      human_string = parsed_items[2]
-      uid_to_human[uid] = human_string
-
-    # Loads mapping from string UID to integer node ID.
-    node_id_to_uid = {}
-    proto_as_ascii = tf.gfile.GFile(label_lookup_path).readlines()
-    for line in proto_as_ascii:
-      if line.startswith('  target_class:'):
-        target_class = int(line.split(': ')[1])
-      if line.startswith('  target_class_string:'):
-        target_class_string = line.split(': ')[1]
-        node_id_to_uid[target_class] = target_class_string[1:-2]
-
-    # Loads the final mapping of integer node ID to human-readable string
-    node_id_to_name = {}
-    for key, val in node_id_to_uid.items():
-      if val not in uid_to_human:
-        tf.logging.fatal('Failed to locate: %s', val)
-      name = uid_to_human[val]
-      node_id_to_name[key] = name
-
-    return node_id_to_name
-
-  def id_to_string(self, node_id):
-    if node_id not in self.node_lookup:
-      return ''
-    return self.node_lookup[node_id]
-
-
-def create_graph():
-  """Creates a graph from saved GraphDef file and returns a saver."""
-  # Creates graph from saved graph_def.pb.
-  with tf.gfile.FastGFile(os.path.join(
-      FLAGS.model_dir, 'classify_image_graph_def.pb'), 'rb') as f:
-    graph_def = tf.GraphDef()
-    graph_def.ParseFromString(f.read())
-    _ = tf.import_graph_def(graph_def, name='')
+    dest_directory = FLAGS.model_dir
 
+    image_size = inception.inception_v4.default_image_size
 
-def run_inference_on_image(image):
-  """Runs inference on an image.
-
-  Args:
-    image: Image file name.
-
-  Returns:
-    Nothing
-  """
-  if not tf.gfile.Exists(image):
-    tf.logging.fatal('File does not exist %s', image)
-  image_data = tf.gfile.FastGFile(image, 'rb').read()
-
-  # Creates graph from saved GraphDef.
-  create_graph()
-
-  with tf.Session() as sess:
-    # Some useful tensors:
-    # 'softmax:0': A tensor containing the normalized prediction across
-    #   1000 labels.
-    # 'pool_3:0': A tensor containing the next-to-last layer containing 2048
-    #   float description of the image.
-    # 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG
-    #   encoding of the image.
-    # Runs the softmax tensor by feeding the image_data as input to the graph.
-    softmax_tensor = sess.graph.get_tensor_by_name('softmax:0')
-    predictions = sess.run(softmax_tensor,
-                           {'DecodeJpeg/contents:0': image_data})
-    predictions = np.squeeze(predictions)
-
-    # Creates node ID --> English string lookup.
-    node_lookup = NodeLookup()
-
-    top_k = predictions.argsort()[-FLAGS.num_top_predictions:][::-1]
-    for node_id in top_k:
-      human_string = node_lookup.id_to_string(node_id)
-      score = predictions[node_id]
-      print('%s (score = %.5f)' % (human_string, score))
+    if not tf.gfile.Exists(image):
+        tf.logging.fatal('File does not exist %s', image)
+    image_string = tf.gfile.FastGFile(image, 'rb').read()
 
+    with tf.Graph().as_default():
+        image = tf.image.decode_jpeg(image_string, channels=3)
+        processed_image = inception_preprocessing.preprocess_image(
+            image, image_size, image_size, is_training=False)
+        processed_images = tf.expand_dims(processed_image, 0)
+
+        # Create the model, use the default arg scope to configure the batch
+        # norm parameters.
+        with slim.arg_scope(inception.inception_v4_arg_scope()):
+            logits, _ = inception.inception_v4(
+                processed_images, num_classes=1001, is_training=False)
+        probabilities = tf.nn.softmax(logits)
+
+        init_fn = slim.assign_from_checkpoint_fn(
+            os.path.join(dest_directory, 'inception_v4.ckpt'),
+            slim.get_model_variables('InceptionV4'))
+
+        with tf.Session() as sess:
+            init_fn(sess)
+            _, probabilities = sess.run([image, probabilities])
 
 Review comment:
   This code is lot inefficient. It takes about 15 seconds on my super powerful machine.
   
   **Problem here**
   Sessions can be handled in a better way to reduce disk IO (Loading model is a disk IO).
   Initializing the model and restoring the checkpoint is costly operation compared to the
classification itself.
   
   
   **How to fix:**
   Please reimagine Classifier service as an object in Object oriented system.
   Get the tensorflow session initialized in the constructor
   Initialize the model and restore checkpoint in the constructor.
   It is perfectly okay to not safely closing the session, here we are not modifying the model.
   
   Then in the classify action, simply use pre-initialized session and the model to classify
image.
   Note that classify action/method may be called a million times.
   
   You may refer the old code with these points you will see the pattern. However, if you
face difficulty ping me , I will have another closer look in newer version of TF how to do
this.
   
   @KranthiGV 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Update Inception v3 to Inception v4 in Object recognition parser 
> -----------------------------------------------------------------
>
>                 Key: TIKA-2306
>                 URL: https://issues.apache.org/jira/browse/TIKA-2306
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.14
>            Reporter: Kranthi Kiran GV
>            Priority: Minor
>              Labels: inception, object_recognition
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Object Recognition Parser currently uses Inception V3 model for the object classification
task. Google released a newer Inception V4 model [1][2].
> It has an improved Top -1 accuracy of 80.2 and Top-5 accuracy of 95.2 [3].
> I believe that Tika community would benefit from it. I would be working on this issue
in the next few days.
> [1] https://research.googleblog.com/2016/08/improving-inception-and-image.html
> [2] https://arxiv.org/abs/1602.07261
> [3] https://github.com/tensorflow/models/tree/master/slim



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message