tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: RFE: adding a ParserFactory class
Date Fri, 24 Oct 2008 08:04:26 GMT

On Thu, Oct 23, 2008 at 5:32 PM, Stephane Bastian
<stephane_bastian@hotmail.com> wrote:
> However, a ParserFactory class (which doesn't exist yet) would really help
> us here and could provide public method(s) to do what's currently done
> internally by the class AutoDetectParser

You should be able to achieve this functionality by overriding the
getParser(Metadata) method in CompositeParser (that AutoDetectParser

Alteratively you could simply modify the Tika configuration and pass
the modified configuration to the AutoDetectParser instance.

More generally, is there a specific reason why you need custom
processing for HTML?


Jukka Zitting

View raw message