tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <michael.wech...@wyona.com>
Subject Re: Customzing TikaConfig or rather getParser
Date Mon, 25 Aug 2008 07:06:43 GMT
Thorsten Scherler schrieb:
> On Tue, 2008-08-19 at 16:54 +0200, Michael Wechner wrote:
>   
>> Hi
>>
>>     
>
> Hi Michi,
>   

Hello Thorsten :-)
>   
>
> I would reuse the config and create a config file
> ("/PathTo/myConfig.xml") like follow. I asked about the if doc-type is a
> possibility since it would make configuration much easier.
>
> Instead to use the plain mime type I would use the doc type:
>   

what exactly do mean with doc type?
> <parser name="parse-myDocType"
> class="org.apache.tika.parser.docType.MyDocTypeParser">
>   <mime>myDoctype</mime>
> </parser>
>
> and then from your code call 
> TikaConfig config = new TikaConfig("/PathTo/myConfig.xml");
> Parser parser = config.getParser("myDoctype");
>   

I think this is where the problem is, I mean the getParser(String) method.

I would like to overwrite this method by implementing my own chain of 
responsibility.

Hence I think it would be nice to enhance this by introducing a new method

TikaConfig.getParser(ParserSelector)

(similar to 
http://java.sun.com/j2se/1.4.2/docs/api/java/io/File.html#listFiles(java.io.FileFilter))

and ParserSelector would be an interface

(similar to http://java.sun.com/j2se/1.4.2/docs/api/java/io/FileFilter.html)

WDYT?

Thanks

Michael
> ...
>
> However this is to reuse the current code more then find a definitive
> solution, but maybe somebody else has another idea.
>
> HTH
>
> salu2
>
>   
>> Thanks
>>
>> Michael
>>     


Mime
View raw message