tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika
Date Thu, 05 Nov 2015 23:23:27 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992696#comment-14992696

ASF GitHub Bot commented on TIKA-1787:

GitHub user TaichiHo opened a pull request:


    fix for TIKA-1787 contributed by Yueheng He

    Succeed in building using java 1.8.0_65. 
    To see the effect, create a text file like the following. 
    Good afternoon Rajat Raina, how are you today? Hi, I am Tom Brady. I go to school at Stanford
University, which is located in California.
    Save it as test.ner and feed it to tika. 
    java -classpath tika-app/target/tika-app-1.12-SNAPSHOT.jar org.apache.tika.cli.TikaCLI
-m test.ner
    The result should look like this
    Content-Length: 137
    Content-Type: application/stanford-ner
    LOCATION: [California]
    ORGANIZATION: [Stanford University]
    PERSON: [Rajat Raina, Tom Brady]
    X-Parsed-By: org.apache.tika.parser.DefaultParser
    X-Parsed-By: org.apache.tika.parser.stanfordNer.StanfordNerParser
    resourceName: test.ner

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/TaichiHo/tika TIKA-1787

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #62
commit b94331ece262bb8d8408dda7b22b6dc0bb69557e
Author: Taichi <heyuehengtaichi@gmail.com>
Date:   2015-11-05T22:47:22Z

    fix for TIKA-1787 contributed by Yueheng He


> Include Stanford Name Entity Recognition in Tika
> ------------------------------------------------
>                 Key: TIKA-1787
>                 URL: https://issues.apache.org/jira/browse/TIKA-1787
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime, parser
>    Affects Versions: 1.12
>         Environment: Java 1.8, Mac OSX 10.11
>            Reporter: Yueheng He
>              Labels: features, newbie, test
>             Fix For: 1.12
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> Using the Stanford Name Entity Recognition, Tika will be able to extract name entities
like PERSON, ORGANIZATION, LOCATION, etc from the given text. The extracted name entities
will be added to the metadata

This message was sent by Atlassian JIRA

View raw message