tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Getting started
Date Tue, 13 Jul 2010 14:01:35 GMT
Thanks Nick and thanks Arturo, for the offer to write a small guide to getting started with
parsing. It might be good to create a JIRA issue for this? Arturo, can you head over to JIRA
and create an issue to contribute a "get Tika parsing up and running in 5 minutes" quick start
guide? Then, you could write the guide in APT format (see here [1] for an example and [2]
for more detailed information), add your new guide file to your local SVN checkout, create
a patch and then attach it to your new issue. I'd be happy to get it into the documentation



[1] http://svn.apache.org/repos/asf/tika/trunk/src/site/apt/formats.apt
[2] http://maven.apache.org/doxia/references/apt-format.html

On 7/13/10 3:54 AM, "Arturo Beltran" <arturo.beltran@uji.es> wrote:

That was my "big" problem all this time, I almost went crazy. Now it
works perfectly, thank you very much for your help.

It might be interesting to write a small manual: "How to create a new
Tika Parser for Dummies". Simply including the three steps that I have
finally figured out (new Parser, tika-mimetypes.xml, list the new parser).

Greetings and thanks Nick it has been a great help

El 13/07/2010 12:37, Nick Burch escribi├│:
> On Tue, 13 Jul 2010, Arturo Beltran wrote:
>> I'm calling my parser using the Tika-app included, so I think I'm
>> using AutoDetectParser.
> You have to explicitly tell the AutoDetectParser to try your parser,
> in addition to the mime type definition
> List your new parser in:
> tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
> and I think it should then be picked up
> Nick

Arturo Beltran Fonollosa
Institute of New Imaging Technologies (INIT): http://www.init.uji.es
Geographic Information research group: http://www.geoinfo.uji.es
Universitat Jaume I, Avda. de Vicente Sos Baynat s/n
E-12071, Castell├│n, Spain
mailto: arturo.beltran@uji.es

Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message