From dev-return-3489-apmail-tika-dev-archive=tika.apache.org@tika.apache.org Mon Jun 21 10:35:06 2010 Return-Path: Delivered-To: apmail-tika-dev-archive@www.apache.org Received: (qmail 10647 invoked from network); 21 Jun 2010 10:35:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Jun 2010 10:35:06 -0000 Received: (qmail 68340 invoked by uid 500); 21 Jun 2010 10:35:06 -0000 Delivered-To: apmail-tika-dev-archive@tika.apache.org Received: (qmail 68217 invoked by uid 500); 21 Jun 2010 10:35:04 -0000 Mailing-List: contact dev-help@tika.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tika.apache.org Delivered-To: mailing list dev@tika.apache.org Received: (qmail 68208 invoked by uid 99); 21 Jun 2010 10:35:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jun 2010 10:35:03 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of arturo.beltran@uji.es designates 150.128.98.10 as permitted sender) Received: from [150.128.98.10] (HELO marti.uji.es) (150.128.98.10) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jun 2010 10:34:54 +0000 Received: from localhost (postfix-e02.uji.es [150.128.193.60]) by localhost (Postfix) with ESMTP id D5C7F1342CE for ; Mon, 21 Jun 2010 12:34:33 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=uji.es; h= content-type:content-type:in-reply-to:references:subject:subject :mime-version:user-agent:from:from:date:date:message-id:received :received:received:received:received; s=selector; t=1277116469; x=1279708469; bh=4R1Skr4htUtPHNGiOioF2K/fwFXU8oWw3Lmg4lbjqe8=; b= KI3xDr087IeZo/TalGJ6XaH13BPTWaj1G36hBoUhEXg+LoViNR5yGljtQAkA8NEx RcdzVGeas2p+USFCygAcytg9j8LMbfaWUTIQ4EUjIGGkQABqHF5setGPZXXvgei6 0IPcdcpgsw08hIsaEHa3SswbpM1MlDHEMUrp4Y6CWoAnEFvfKK591uTY0hi1pwLf c5JphkGXO+77uFnJ9IVeilaFqpsA+0LPtGWITCfVFjPvCICDF9aefsO6XxpFzUwi Qwl3jFjIr9TUib20N2KK4N3wVoIYeuq2zxOs87PfPpSoxmiwigfP6l+nsw5pY5mM L75/9m/h+420RasGz897dg== X-Virus-Scanned: by amavisd-new at uji.es Received: from postfix-e02.uji.es ([150.128.193.60]) by localhost (postfix-e02.uji.es [150.128.193.60]) (amavisd-new, port 2027) with LMTP id 4lUgVoljEIbB for ; Mon, 21 Jun 2010 12:34:29 +0200 (CEST) Received: from mail.uji.es (mail.uji.es [150.128.98.40]) by postfix-e02.uji.es (Postfix) with ESMTPS id BA5E6134F42 for ; Mon, 21 Jun 2010 12:34:29 +0200 (CEST) Received: from localhost (postfix-i01.uji.es [150.128.193.21]) by localhost (Postfix) with ESMTP id A98E11F0084 for ; Mon, 21 Jun 2010 12:34:29 +0200 (CEST) X-Virus-Scanned: by amavisd-new at uji.es Received: from mail.uji.es ([150.128.193.21]) by localhost (postfix-i01.uji.es [150.128.193.21]) (amavisd-new, port 2027) with ESMTP id lfBTQjzV6RHl for ; Mon, 21 Jun 2010 12:34:29 +0200 (CEST) Received: from [150.128.80.206] (geo3.dlsi.uji.es [150.128.80.206]) by postfix-i01.uji.es (Postfix) with ESMTP id 1993E1F0025 for ; Mon, 21 Jun 2010 12:34:29 +0200 (CEST) Message-ID: <4C1F4034.90002@uji.es> Date: Mon, 21 Jun 2010 12:34:28 +0200 From: Arturo Beltran User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5 MIME-Version: 1.0 To: dev@tika.apache.org Subject: Re: Getting started References: <24727299.55741276784064157.JavaMail.jira@thor> <4C1A3395.8050404@uji.es> <2CDC18D5-1F95-430B-BFA3-58B8E5E86CFF@transpac.com> In-Reply-To: <2CDC18D5-1F95-430B-BFA3-58B8E5E86CFF@transpac.com> Content-Type: multipart/alternative; boundary="------------060502040102080206050903" X-Virus-Checked: Checked by ClamAV on apache.org --------------060502040102080206050903 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Hi Ken, First of all, thanks for your quick response. This's exactly what I'm doing, but despite that Tika recognizes the new MIME tipe, my new parser is not called. I added to tika-mimetypes.xml: I created a new class GeoParser: public class GeoParser implements Parser { private static final Set SUPPORTED_TYPES = Collections.singleton(MediaType.application("shp")); public static final String SHP_MIME_TYPE = "application/shp"; public Set getSupportedTypes(ParseContext context) { return SUPPORTED_TYPES; } public void parse( InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException { metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE); metadata.set("Hello", "World"); System.out.println("HELLO WORLD"); System.err.println("ERR Hello world"); XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata); xhtml.startDocument(); xhtml.endDocument(); } ... } And that's the result: Content-Length: 755072 Content-Type: application/shp resourceName: comarques250.shp I don't know wht exactly is failing, but I can't make it work. Greetings and thanks in advance for your help. Arturo El 17/06/2010 18:25, Ken Krugler escribió: > Hi Arturo, > >> Some of you already know that I'm working on a new parser >> (https://issues.apache.org/jira/browse/TIKA-443). After all day >> trying to set up a workspace for Eclipse, I implemented the typical >> "hello world" class, in the Tika Parser version. My problem now, is >> how to configure Tika in order to call my new parser when a file with >> especific extension (p.e. *.shp) is found. I read something about a >> configuration file (tika-config.xml) but I couldn't find it in the >> source code. > > You first need to modify tika-core/src/main/resources/tika-mimetypes.xml. > > E.g. something like this was done for mailbox files. > > > > > > > That maps the suffix to the mime-type. > > Then you define the SUPPORTED_TYPES static class field in your parser > class that defines what mime-types it supports. > > E.g. for MboxParser: > > public class MboxParser implements Parser { > > private static final Set SUPPORTED_TYPES = > Collections.singleton(MediaType.application("mbox")); > > > -- Ken > > -------------------------------------------- > > +1 530-265-2225 > > > > > > > -------------------------------------------- > Ken Krugler > +1 530-210-6378 > http://bixolabs.com > e l a s t i c w e b m i n i n g > > > > > -- Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castellón, Spain mailto: arturo.beltran@uji.es --------------060502040102080206050903--