tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Palsulich <tpalsul...@gmail.com>
Subject Re: Starting Advice
Date Thu, 07 Aug 2014 17:24:40 GMT
Hi Roger,

Thanks for your interest in Tika! In a nutshell, Tika is a content
extraction tool. You can extract metadata and text, identify spoken
languages, and translate text using internet APIs (for now, we're working
on machine translation). We're in the process of releasing version 1.6.
Tika In Action is a book written by Chris Mattmann, the lead and co-creator
of Tika. You can find more info at [0].

You can use Tika multiple ways:

*1. tika-app jar*. Try downloading a release on tika.apache.org and running
`java -jar tika-app.jar [some file]`.
*2. GUI*. Try running `java -jar tika-app.jar --gui`. A graphical interface
will pop up. Then, try dragging a file into the window.
*3. Tika server*. Run `java -jar tika-app.jar --server`. Then, try one of
the commands from [0] (e.g. `curl -X PUT -d @example.csv
http://localhost:9998/meta --header "Content-Type: text/csv"`).
*4. Java API*. Check out an example of using Parser.parse() at [2].

Hope that helps!

Tyler

[0] - http://www.manning.com/mattmann/
[1] - http://wiki.apache.org/tika/TikaJAXRS
[2] - https://github.com/tpalsulich/TikaExamples


On Wed, Aug 6, 2014 at 11:04 PM, Alex Ott <alexott@gmail.com> wrote:

> I think, that the "Tika in Action" is still actual...
>
>
> On Wed, Aug 6, 2014 at 11:03 PM, Roger Carter <rogercarter09@gmail.com>
> wrote:
>
> > Hi Everyone,
> >
> > I'm new to the apache scene; I have experience with Matlab and minimal
> > experience with Python. This seems like a powerful tool and I'd like to
> > learn more. If anyone is willing to provide reccomendations for resources
> > or detail their experiences in learning Tika, I would be most grateful.
> >
> > Thanks,
> > Roger
> >
>
>
>
> --
> With best wishes,                    Alex Ott
> http://alexott.net/
> Twitter: alexott_en (English), alexott (Russian)
> Skype: alex.ott
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message