tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: The case of the unexpected error
Date Wed, 16 Dec 2009 01:33:37 GMT
Hi,

On Wed, Dec 16, 2009 at 1:31 AM, Ken Krugler
<kkrugler_lists@transpac.com> wrote:
> But I ran into a problem, where the Tika auto-detect code was correctly
> identifying  a file as being a Microsoft format, even though the server said
> it was text/plain. The Tika Microsoft parser would try to dynamically figure
> out which support code to call, and in the end it throws a
> NoSuchMethodError.
>
> Note that this is an Error, not an Exception. As such, it flies on past all
> of the Tika catch blocks, and my own code's catch blocks, and kills the
> Hadoop job in weird and wonderful ways.
>
> It seems like Errors shouldn't be thrown for situations where dynamic
> configuration could result in a class not existing, but before I started
> writing up an issue I wanted to get input from the community about this.
> It's a bit gray to me, since I essentially "did it to myself" by excluding
> jars.

As a general rule I think Tika should be more resilient about such issues.

The TikaConfig code that tries to load and instantiate the configured
parser classes was already made to catch and ignore any Throwables,
but I guess in this case the problem occurs outside TikaConfig when
the parse() method of the instantiated parser is called.

Catching Errors is a bit questionable, but it sounds like in this case
we should do it. See the code in CompositeParser that already catches
any RuntimeExceptions and wraps them into TikaExceptions. Perhaps we
should add similar handling also for Errors.

BR,

Jukka Zitting

Mime
View raw message