tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Getting started
Date Thu, 17 Jun 2010 16:25:46 GMT
Hi Arturo,

> Some of you already know that I'm working on a new parser (https://issues.apache.org/jira/browse/TIKA-443

> ). After all day trying to set up a workspace for Eclipse, I  
> implemented the typical "hello world" class, in the Tika Parser  
> version. My problem now, is how to configure Tika in order to call  
> my new parser when a file with especific extension (p.e. *.shp) is  
> found. I read something about a configuration file (tika-config.xml)  
> but I couldn't find it in the source code.

You first need to modify tika-core/src/main/resources/tika- 
mimetypes.xml.

E.g. something like this was done for mailbox files.

   <mime-type type="application/mbox">
     <sub-class-of type="text/plain"/>
     <glob pattern="*.mbox"/>
   </mime-type>

That maps the suffix to the mime-type.

Then you define the SUPPORTED_TYPES static class field in your parser  
class that defines what mime-types it supports.

E.g. for MboxParser:

public class MboxParser implements Parser {

     private static final Set<MediaType> SUPPORTED_TYPES =
         Collections.singleton(MediaType.application("mbox"));


-- Ken

--------------------------------------------
<http://ken-blog.krugler.org>
+1 530-265-2225






--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message