tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: Customzing TikaConfig or rather getParser
Date Thu, 04 Sep 2008 11:21:44 GMT

On Thu, Sep 4, 2008 at 11:31 AM, Michael Wechner
<michael.wechner@wyona.com> wrote:
> this seems to work for our usecase, but it seems to me that the actual
> problem is just transfered one step further down.

"There are few problems in computer science that can not be solved by
adding another level of indirection." -Tom Christansen

> I think it would be better to separate the parser actual selection (via
> chain of responsibility) from passing in metadata.

The way I see it, an application should ideally only deal with a
single Parser instance, that would be smart enough to select the
appropriate parsing mechanism for each incoming document based on the
associated metadata.

The reason for making the Metadata object a modifiable input/output
parameter (instead of just a return value) of the parse() method was
that a client application could feed extra metadata to the parsing
process. In your use case that extra metadata would be the path of the


Jukka Zitting

View raw message