tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kranthi Kiran G V <kkran...@student.nitw.ac.in>
Subject Re: Regarding Image Captioning in Tika for Image MIME Types
Date Wed, 19 Apr 2017 20:43:05 GMT
Hello mentors,

I have released a trained model of the neural image captioning system,
im2txt.
It can be found here:
https://github.com/KranthiGV/Pretrained-Show-and-Tell-model

I am hopeful it would benefit both the researchers community and Apache
Tika's
community for the image captioning.

Have a lot at it!

Thank you,
Kranthi Kiran GV,
CS 3/4 Undergrad,
NIT Warangal

On Wed, Mar 29, 2017 at 6:50 PM, Mattmann, Chris A (3010) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Sounds great, and understood. Please prepare your proposal and share with
> Thamme and I for
> feedback as your (potential) mentors.
>
>
>
> Thanks much.
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Chris Mattmann, Ph.D.
>
> Principal Data Scientist, Engineering Administrative Office (3010)
>
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212)
>
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>
> Office: 180-503E, Mailstop: 180-503
>
> Email: chris.a.mattmann@nasa.gov
>
> WWW:  http://sunset.usc.edu/~mattmann/
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Director, Information Retrieval and Data Science Group (IRDS)
>
> Adjunct Associate Professor, Computer Science Department
>
> University of Southern California, Los Angeles, CA 90089 USA
>
> WWW: http://irds.usc.edu/
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> *From: *Kranthi Kiran G V <kkranthi@student.nitw.ac.in>
> *Date: *Wednesday, March 29, 2017 at 9:17 AM
> *To: *Thamme Gowda <thammegowda@apache.org>
> *Cc: *Chris Mattmann <mattmann@apache.org>, "dev@tika.apache.org" <
> dev@tika.apache.org>
> *Subject: *Re: Regarding Image Captioning in Tika for Image MIME Types
>
>
>
> Hello,
>
> 1) I have submitted a PR which can be found here
> <https://github.com/apache/tika/pull/163>.
>
> 2) After working on the Show and Tell model since a week, I realized that
> the amount of computation resources I have are enough to take up the
> challenge.
>
> Here is a sample caption I generated after a few days of training.
>
> INFO:tensorflow:Loading model from checkpoint: /media/timberners/magicae/
> models/im2txt/im2txt/model/train/model.ckpt-174685
> INFO:tensorflow:Successfully loaded checkpoint: model.ckpt-174685
> Captions for image COCO_val2014_000000224477.jpg:
>   0) a man riding a wave on top of a surfboard . (p=0.016002)
>   1) a man riding a surfboard on a wave in the ocean . (p=0.007747)
>   2) a man riding a wave on a surfboard in the ocean . (p=0.007673)
>
> The evaluation is on the image in the example at im2txt's page
> <https://github.com/tensorflow/models/tree/master/im2txt#generating-captions>.
>
>
> I'm excited to release the pre-trained model (if I'm allowed to) to the
> public during my GSoC journey to enable everyone to use it even though they
> do not have enough resources. I think it would be a great contribution to
> both Apache Tika and Computer Vision community as a whole.
>
> 3) I am working on the schedule. I would be submitting a draft in GSoC
> page. Should I send it here, too?
>
> Regarding my other commitments, I would be working with Amazon India
> Development Centre during May 10th to July 10th. They offer flexible
> working hours.
>
> I would be able to dedicate 40-45 hours per week. My ability to balance
> both of them can be showcased by how I am working at Deep Learning Research
> Group - NITW currently in the college.
>
> What do you think?
>
>
>
> On Mon, Mar 27, 2017 at 11:00 PM, Thamme Gowda <thammegowda@apache.org>
> wrote:
>
> Hi Kranthi Kiran,
>
>
>
> 1. Thanks for the update. I look forward to your PR.
>
>
>
> 2. I don't have complete details about compute resources from GSoC. I
> think google offers free credits (Approx. 300$) when students signup to
> Google Compute Engine. I am not worried about it at this time, we can sort
> it out later.
>
>
>
> 3. Great to know!'
>
>
>
> Best,
>
> TG
>
>
> *--*
>
> *Thamme Gowda*
>
> TG | @thammegowda <https://twitter.com/thammegowda>
>
> ~Sent via somebody's Webmail server!
>
>
>
> On Fri, Mar 24, 2017 at 10:42 PM, Kranthi Kiran G V <
> kkranthi@student.nitw.ac.in> wrote:
>
> Apologies if I was ambiguous.
>
>
>
> 1) I have already started working on the improvement. The general method
> is working. I'll send a merge request after I port the REST method, too.
>
>
>
> 2) I was mentioning about the computational resources to train the final
> layer of im2txt to output the captions. Google hasn't released a
> pre-trained model.
>
>
>
> 3) I would update the developer community with a tentative GSoC schedule
> by tonight. It would be great if the community gives me suggestions.
>
>
>
> On Mar 25, 2017 12:06 AM, "Thamme Gowda" <thammegowda@apache.org> wrote:
>
> Hi Kranthi Kiran,
>
>
>
> Please find my replies below:
>
>
>
> Let me know if you have more questions.
>
>
>
> Thanks,
>
> TG
>
> *--*
>
> *Thamme Gowda*
>
> TG | @thammegowda <https://twitter.com/thammegowda>
>
> ~Sent via somebody's Webmail server!
>
>
>
> On Tue, Mar 21, 2017 at 12:21 PM, Kranthi Kiran G V <
> kkranthi@student.nitw.ac.in> wrote:
>
> Hello Thamme Gowda,
>
> Thank you for letting me know of the developer mailing list. I have
> created an issue [1] and I would be working on it.
>
> The change is not straightforward since Inception V3 pre-trained model has
> a graph while the Inception V3 pre-trained model is packaged in the form of
> a check-point (ckpt) [2].
>
>
>
> Okay, I see Inception-V3 has a graph, V4 has a checkpoint.
>
> I assume there should be a way to restore model from checkpoint? Please
> refer https://www.tensorflow.org/programmers_guide/
> variables#checkpoint_files
>
>
>
>
>
> What do you think of using Keras to implement the Inception V4 model? It
> would make the job of scaling it on CPU clusters easier if we can use
> deeplearning4j's model import.
>
>
>
> Should I proceed in that direction?
>
>
>
> Regarding GSoC, what kind of computation resources are we given access to?
> We would have to train the show and tell network. It takes a lot of
> computation resources.
>
>
>
> If GPUs are not used, we would have to use a CPU cluster. So, the code has
> to be re-written (from the Google implementation of Inception V4).
>
>
>
>
> Training IncpetionV4 from scratch requires too much effort, time, and
> resources.  We are not aiming for such things, atleast not as part of Tika
> and GSoC. The suggestion i mentioned earlier was to upgrade IncpetionV3
> model with Inception V4 pretrained model/checkpoint since that will be more
> benificial to Tika users community :-)
>
>
>
>
>
>
>
> [1] https://issues.apache.org/jira/browse/TIKA-2306
>
> [2] https://github.com/tensorflow/models/tree/master/
> slim#pre-trained-models
>
>
>
>
>
>
> On Mon, Mar 20, 2017 at 3:17 AM, Thamme Gowda <thammegowda@apache.org>
> wrote:
>
> Hi Kranthi Kiran,
>
>
>
> Welcome to Tika Community. we are glad you are interested in working on
> the issue.
>
> Please remember to CC dev@tika mailing list for future discussions
> related to tika.
>
>
>
>  *Should the model be trainable by the user?*
>
> The basic minimum requirement is to provide a pre-trained model and make
> the parser work out of the box without Training (expect no GPUs; expect a
> JVM and nothing else).
>
> Of course, the parser configuration should have options to change the
> models by changing the path.
>
>
>
> As part of this GSoC project, integration isn't enough work. If you go
> through the links provided in the Jira page you will notice that there
> models for image recognition but no ready-made models for captioning. We
> will have to train the im2text network from the dataset and make it
> available. Thus we will have to open source the training utilities,
> documentation or any supplementary tools we build along the way. We will
> have to document all these in Tika wiki for the advanced users!
>
>
>
> This is a GSoC issue and thus we expect to work on it during the summer.
>
>
>
> For now, if you want a small task to familiarise yourself with Tika, I
> have a suggestion:
>
> Currently, Tika uses InceptionV3 model from Google for image recognition.
>
> The InceptionV4 model is out recently which proved to be more accurate
> than V3.
>
>
>
> How about upgrading tika to use newer Inception model?
>
>
>
> Let me know if you have more questions.
>
>
>
> Cheers,
>
> TG
>
>
> *--*
>
> *Thamme Gowda*
>
> TG | @thammegowda <https://twitter.com/thammegowda>
>
> ~Sent via somebody's Webmail server!
>
>
>
> On Sun, Mar 19, 2017 at 11:56 AM, Kranthi Kiran G V <
> kkranthi@student.nitw.ac.in> wrote:
>
> Hello,
> I'm Kranthi, a 3rd computer science undergrad at NIT, Warangal and a
> member of Deep Learning research group at out college. I'm interested to
> take up the issue. I believe it would be a great contribution to the Apache
> Tika community.
>
> This is what I have done until now:
>
> 1) Build Tika from source using maven and explore it.
> 2) Tried the object recognition module from the command line. (I should
> probably start using the docker version to speed up my progress.)
>
> I am yet to import a keras model in dl4j. I have some doubts regarding the
> requirements since I'm new to this community. *Should the model be
> trainable by the user?* This is important because the Inception v3 model
> without re-training has performed poorly for me (I'm currently training it
> with less number of steps due to limited computational resources I have --
> GTX 1070).
>
> TODO (Before submitting the proposal):
>
> 1) Create a test REST API for Tika
>
> 2) Import a few models in dl4j.
>
> 3) Train im2txt on my computer.
>
> Thank you,
>
> Kranthi Kiran
>
>
>
>
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message