Hi Ken,
First of all, thanks for your quick response.
This's exactly what I'm doing, but despite that Tika recognizes the new
MIME tipe, my new parser is not called.
I added to tika-mimetypes.xml:
<mime-type type="application/shp">
<!--sub-class-of type="application/octet-stream"/-->
<glob pattern="*.shp"/>
</mime-type>
I created a new class GeoParser:
public class GeoParser implements Parser {
private static final Set<MediaType> SUPPORTED_TYPES =
Collections.singleton(MediaType.application("shp"));
public static final String SHP_MIME_TYPE = "application/shp";
public Set<MediaType> getSupportedTypes(ParseContext context) {
return SUPPORTED_TYPES;
}
public void parse(
InputStream stream, ContentHandler handler,
Metadata metadata, ParseContext context)
throws IOException, SAXException, TikaException {
metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE);
metadata.set("Hello", "World");
System.out.println("HELLO WORLD");
System.err.println("ERR Hello world");
XHTMLContentHandler xhtml = new XHTMLContentHandler(handler,
metadata);
xhtml.startDocument();
xhtml.endDocument();
}
...
}
And that's the result:
Content-Length: 755072
Content-Type: application/shp
resourceName: comarques250.shp
I don't know wht exactly is failing, but I can't make it work.
Greetings and thanks in advance for your help.
Arturo
El 17/06/2010 18:25, Ken Krugler escribió:
> Hi Arturo,
>
>> Some of you already know that I'm working on a new parser
>> (https://issues.apache.org/jira/browse/TIKA-443). After all day
>> trying to set up a workspace for Eclipse, I implemented the typical
>> "hello world" class, in the Tika Parser version. My problem now, is
>> how to configure Tika in order to call my new parser when a file with
>> especific extension (p.e. *.shp) is found. I read something about a
>> configuration file (tika-config.xml) but I couldn't find it in the
>> source code.
>
> You first need to modify tika-core/src/main/resources/tika-mimetypes.xml.
>
> E.g. something like this was done for mailbox files.
>
> <mime-type type="application/mbox">
> <sub-class-of type="text/plain"/>
> <glob pattern="*.mbox"/>
> </mime-type>
>
> That maps the suffix to the mime-type.
>
> Then you define the SUPPORTED_TYPES static class field in your parser
> class that defines what mime-types it supports.
>
> E.g. for MboxParser:
>
> public class MboxParser implements Parser {
>
> private static final Set<MediaType> SUPPORTED_TYPES =
> Collections.singleton(MediaType.application("mbox"));
>
>
> -- Ken
>
> --------------------------------------------
> <http://ken-blog.krugler.org>
> +1 530-265-2225
>
>
>
>
>
>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c w e b m i n i n g
>
>
>
>
>
--
Arturo Beltran Fonollosa
Institute of New Imaging Technologies (INIT): http://www.init.uji.es
Geographic Information research group: http://www.geoinfo.uji.es
Universitat Jaume I, Avda. de Vicente Sos Baynat s/n
E-12071, Castellón, Spain
mailto: arturo.beltran@uji.es
|