From tika-dev-return-1340-apmail-incubator-tika-dev-archive=incubator.apache.org@incubator.apache.org Thu Sep 04 09:32:08 2008 Return-Path: Delivered-To: apmail-incubator-tika-dev-archive@locus.apache.org Received: (qmail 76675 invoked from network); 4 Sep 2008 09:32:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Sep 2008 09:32:07 -0000 Received: (qmail 2617 invoked by uid 500); 4 Sep 2008 09:32:05 -0000 Delivered-To: apmail-incubator-tika-dev-archive@incubator.apache.org Received: (qmail 2571 invoked by uid 500); 4 Sep 2008 09:32:05 -0000 Mailing-List: contact tika-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: tika-dev@incubator.apache.org Delivered-To: mailing list tika-dev@incubator.apache.org Received: (qmail 2560 invoked by uid 99); 4 Sep 2008 09:32:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 02:32:05 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael.wechner@wyona.com designates 195.226.6.75 as permitted sender) Received: from [195.226.6.75] (HELO server1.example.com) (195.226.6.75) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Sep 2008 09:31:05 +0000 Received: from [192.168.1.57] (unknown [195.226.6.66]) by server1.example.com (Postfix) with ESMTP id 38B3510C9D5 for ; Thu, 4 Sep 2008 11:40:02 +0200 (CEST) Message-ID: <48BFAADE.6010405@wyona.com> Date: Thu, 04 Sep 2008 11:31:10 +0200 From: Michael Wechner User-Agent: Thunderbird 2.0.0.14 (X11/20080421) MIME-Version: 1.0 To: tika-dev@incubator.apache.org Subject: Re: Customzing TikaConfig or rather getParser References: <48AADEC3.1020506@wyona.com> <1219608310.10737.13.camel@cartman> <48B25A03.7070204@wyona.com> <510143ac0808250132y241f9d20n9f3420fbd97722c1@mail.gmail.com> In-Reply-To: <510143ac0808250132y241f9d20n9f3420fbd97722c1@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Jukka Zitting schrieb: > Hi, > > On Mon, Aug 25, 2008 at 9:06 AM, Michael Wechner > wrote: > >> I think this is where the problem is, I mean the getParser(String) method. >> >> I would like to overwrite this method by implementing my own chain of >> responsibility. >> > > How about the following: > > public class MyCustomParser extends CompositeParser { > > public MyCustomParser throws TikaException { > setConfig(TikaConfig.getDefaultConfig()); > // or whatever config you want > } > > protected Parser getParser(Metadata metadata) { > // Custom code to select an appropriate parser > // based on the input metadata (mime type, > // document path, whatever) passed by the client. > // Or fallback to: > return super.getParser(metadata); > } > > } > > Your client code would then look like: > > private Parser parser = new MyCustomParser(); > > Metadata metadata = new Metadata(); > metadata.set(Metadata.CONTENT_TYPE); > // plus whatever other metadata you need in MyCustomParser > > parser.parse(stream, handler, metadata); > > One of my design goals for the current Parser interface was was that > you can encapsulate this sort of functionality inside it. > this seems to work for our usecase, but it seems to me that the actual problem is just transfered one step further down. I think it would be better to separate the parser actual selection (via chain of responsibility) from passing in metadata. Cheers Michael > BR, > > Jukka Zitting >