tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Getting started
Date Mon, 21 Jun 2010 17:04:56 GMT
Are you sure your new parser is on the classpath?

E.g. put a break on getSupportedTypes() and make sure that's getting  
called - if not, then the parser isn't being "found" by Tika.

-- Ken

On Jun 21, 2010, at 3:34am, Arturo Beltran wrote:

> Hi Ken,
>
> First of all, thanks for your quick response.
> This's exactly what I'm doing, but despite that Tika recognizes the  
> new MIME tipe, my new parser is not called.
>
> I added to tika-mimetypes.xml:
>
> <mime-type type="application/shp">
> <!--sub-class-of type="application/octet-stream"/-->
> <glob pattern="*.shp"/>
> </mime-type>
>
> I created a new class GeoParser:
>
> public class GeoParser implements Parser {
>
>    private static final Set<MediaType> SUPPORTED_TYPES =  
> Collections.singleton(MediaType.application("shp"));
>    public static final String SHP_MIME_TYPE = "application/shp";
>
>    public Set<MediaType> getSupportedTypes(ParseContext context) {
>        return SUPPORTED_TYPES;
>    }
>
>    public void parse(
>            InputStream stream, ContentHandler handler,
>            Metadata metadata, ParseContext context)
>            throws IOException, SAXException, TikaException {
>
>        metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE);
>        metadata.set("Hello", "World");
>
>        System.out.println("HELLO WORLD");
>        System.err.println("ERR Hello world");
>
>        XHTMLContentHandler xhtml = new XHTMLContentHandler(handler,  
> metadata);
>        xhtml.startDocument();
>        xhtml.endDocument();
>    }
> ...
> }
>
> And that's the result:
>
> Content-Length:  755072
> Content-Type:  application/shp
> resourceName:  comarques250.shp
>
> I don't know wht exactly is failing, but I can't make it work.
>
> Greetings and thanks in advance for your help.
>     Arturo
>
>
> El 17/06/2010 18:25, Ken Krugler escribi├│:
>> Hi Arturo,
>>
>>> Some of you already know that I'm working on a new parser (https://issues.apache.org/jira/browse/TIKA-443

>>> ). After all day trying to set up a workspace for Eclipse, I  
>>> implemented the typical "hello world" class, in the Tika Parser  
>>> version. My problem now, is how to configure Tika in order to call  
>>> my new parser when a file with especific extension (p.e. *.shp) is  
>>> found. I read something about a configuration file (tika- 
>>> config.xml) but I couldn't find it in the source code.
>>
>> You first need to modify tika-core/src/main/resources/tika- 
>> mimetypes.xml.
>>
>> E.g. something like this was done for mailbox files.
>>
>> <mime-type type="application/mbox">
>> <sub-class-of type="text/plain"/>
>> <glob pattern="*.mbox"/>
>> </mime-type>
>>
>> That maps the suffix to the mime-type.
>>
>> Then you define the SUPPORTED_TYPES static class field in your  
>> parser class that defines what mime-types it supports.
>>
>> E.g. for MboxParser:
>>
>> public class MboxParser implements Parser {
>>
>>    private static final Set<MediaType> SUPPORTED_TYPES =
>>        Collections.singleton(MediaType.application("mbox"));
>>
>>
>> -- Ken
>>
>> --------------------------------------------
>> <http://ken-blog.krugler.org>
>> +1 530-265-2225
>>
>>
>>
>>
>>
>>
>> --------------------------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://bixolabs.com
>> e l a s t i c   w e b   m i n i n g
>>
>>
>>
>>
>>
>
>
> -- 
> Arturo Beltran Fonollosa
> Institute of New Imaging Technologies (INIT): http://www.init.uji.es
> Geographic Information research group: http://www.geoinfo.uji.es
> Universitat Jaume I, Avda. de Vicente Sos Baynat s/n
> E-12071, Castell├│n, Spain
> mailto: arturo.beltran@uji.es
>

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Mime
View raw message