tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From thammegowda <...@git.apache.org>
Subject [GitHub] tika pull request #165: [TIKA-DL] Image recognition powered by deeplearning4...
Date Mon, 03 Apr 2017 03:28:32 GMT
GitHub user thammegowda opened a pull request:

    https://github.com/apache/tika/pull/165

    [TIKA-DL] Image recognition powered by deeplearning4j and InceptionV3

    ## Summary
    
    + added `tika-dl` module which depends on `deeplearning4j` library. This module will produce
an addon with all the DL4J dependencies and its native dependencies which may be optionally
added to classpath by user to make use of it
      + By default, the build system includes native libs for all major platforms (such as
Linux, Windows, OSX, Android/ARM) 
      + Unnecessary native libs can be easily excluded by setting the target platform as `-Djavacpp.platform=<target>`
during the build
      + Permissible target values = {`android-arm`, `linux-x86_64`, `macosx-x86_64`, `windows-x86_64`,
etc.}
    + added `DL4JInceptionV3Net.java` which provides Image recognition features using InceptionV3.

      + Similar to VGG-16 model in #159, VGG-16 model is huuuuge (over 500MB to download)and
requires plenty of RAM (~3GB) to run. The beauty of Inception-V3 model is that it is just
90MB to download and requires ~400MB to run
      +  No setup required. This implementation is configured to download the model when it
runs the first time. It downloads from our [USCDataScience's repo](https://github.com/USCDataScience/dl4j-kerasimport-examples/tree/master/dl4j-import-example/data)
       + It is flexible. Offers plenty of settings to change them. Look for `@Field` annotation
in the code
    + added a Test case to test the above implementation
    
    
    ## How to Test
    1. Build the code : `mvn package` or `mvn package -DskipTests` or `mvn package -DskipTests
-Djavacpp.platform=<>`
    2. Run:
    ```bash
    java -Xmx400m -cp ./tika-dl/target/tika-dl-1.15-SNAPSHOT-jar-with-dependencies.jar:tika-app/target/tika-app-1.15-SNAPSHOT.jar
\
     org.apache.tika.cli.TikaCLI  --config=tika-dl/src/test/resources/org/apache/tika/dl/imagerec/dl4j-inception3-config.xml
dog.jpg
    ```
    
    ## Note:
    Tested on `macosx-x86_64` platform, we have to test on `linux-x86_64` and `windows-x86_64`
before it gets merged. 
    Feedback/Critiques are welcome.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thammegowda/tika tika-dl

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tika/pull/165.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #165
    
----
commit 1472a4e275ed276b69f11dea6d663bc2136566d0
Author: Thamme Gowda <thammegowda@apache.org>
Date:   2017-04-02T20:52:04Z

    [TIKA-DL] Added tika-dl module to the build system

commit ce28a6f545780144736c0d8c84995d218ab6ffbb
Author: Thamme Gowda <thammegowda@apache.org>
Date:   2017-04-02T21:06:50Z

    Fix scheme value for file URIs

commit 3cbf36800b01e5255a4bc1b87d737896a82d3c0f
Author: Thamme Gowda <thammegowda@apache.org>
Date:   2017-04-02T21:25:59Z

    [TIKA-DL] build jar with dependencies by default

commit d1c951396bd5a6a849273f590743931cd89d493e
Author: Thamme Gowda <thammegowda@apache.org>
Date:   2017-04-03T02:42:56Z

    [TIKA-DL] add license headers

commit 81b3f32103a497eaa99511af09eb253275c67cd9
Author: Thamme Gowda <thammegowda@apache.org>
Date:   2017-04-03T02:51:07Z

    Fix typos and unnecessary spaces

commit 5834afeff5de4d1076180de3ddece8e7b807b7f3
Author: Thamme Gowda <thammegowda@apache.org>
Date:   2017-04-03T03:03:11Z

    Fix XML format

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message