tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arturo Beltran <arturo.belt...@uji.es>
Subject Re: Getting started
Date Mon, 21 Jun 2010 10:34:28 GMT
Hi Ken,

First of all, thanks for your quick response.
This's exactly what I'm doing, but despite that Tika recognizes the new 
MIME tipe, my new parser is not called.

I added to tika-mimetypes.xml:

<mime-type type="application/shp">
<!--sub-class-of type="application/octet-stream"/-->
<glob pattern="*.shp"/>
</mime-type>

I created a new class GeoParser:

public class GeoParser implements Parser {

     private static final Set<MediaType> SUPPORTED_TYPES = 
Collections.singleton(MediaType.application("shp"));
     public static final String SHP_MIME_TYPE = "application/shp";

     public Set<MediaType> getSupportedTypes(ParseContext context) {
         return SUPPORTED_TYPES;
     }

     public void parse(
             InputStream stream, ContentHandler handler,
             Metadata metadata, ParseContext context)
             throws IOException, SAXException, TikaException {

         metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE);
         metadata.set("Hello", "World");

         System.out.println("HELLO WORLD");
         System.err.println("ERR Hello world");

         XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, 
metadata);
         xhtml.startDocument();
         xhtml.endDocument();
     }
  ...
}

And that's the result:

Content-Length:  755072
Content-Type:  application/shp
resourceName:  comarques250.shp

I don't know wht exactly is failing, but I can't make it work.

Greetings and thanks in advance for your help.
      Arturo


El 17/06/2010 18:25, Ken Krugler escribió:
> Hi Arturo,
>
>> Some of you already know that I'm working on a new parser 
>> (https://issues.apache.org/jira/browse/TIKA-443). After all day 
>> trying to set up a workspace for Eclipse, I implemented the typical 
>> "hello world" class, in the Tika Parser version. My problem now, is 
>> how to configure Tika in order to call my new parser when a file with 
>> especific extension (p.e. *.shp) is found. I read something about a 
>> configuration file (tika-config.xml) but I couldn't find it in the 
>> source code.
>
> You first need to modify tika-core/src/main/resources/tika-mimetypes.xml.
>
> E.g. something like this was done for mailbox files.
>
> <mime-type type="application/mbox">
> <sub-class-of type="text/plain"/>
> <glob pattern="*.mbox"/>
> </mime-type>
>
> That maps the suffix to the mime-type.
>
> Then you define the SUPPORTED_TYPES static class field in your parser 
> class that defines what mime-types it supports.
>
> E.g. for MboxParser:
>
> public class MboxParser implements Parser {
>
>     private static final Set<MediaType> SUPPORTED_TYPES =
>         Collections.singleton(MediaType.application("mbox"));
>
>
> -- Ken
>
> --------------------------------------------
> <http://ken-blog.krugler.org>
> +1 530-265-2225
>
>
>
>
>
>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>


-- 
Arturo Beltran Fonollosa
Institute of New Imaging Technologies (INIT): http://www.init.uji.es
Geographic Information research group: http://www.geoinfo.uji.es
Universitat Jaume I, Avda. de Vicente Sos Baynat s/n
E-12071, Castellón, Spain
mailto: arturo.beltran@uji.es


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message