tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: [jira] Updated: (TIKA-33) Stateless parsers
Date Wed, 26 Sep 2007 15:35:40 GMT
Hi,

On 9/26/07, kbennett <kbennett@bbsinc.biz> wrote:
> 1) While you are modifying the Parser class, could we change getContents()
> to not swallow exceptions?
> [...]
> and modify the method declaration to throw a TikaException?

Sure, that makes sense.

> 2) In ParserFactory, we have:
>
> } catch (Exception e) {
>     logger.error("Unable to instantiate parser: " + className, e);
>     throw new TikaException(e.getMessage());
> }
>
> When we adapt an exception to a TikaException, would it make sense to wrap
> the entire exception, and not just its getMessage()?

+1

> 3) In Parser.getContents(), we could use Commons Lang StringUtils to make
> the code more nullsafe and a bit more concise by replacing:
>
> int length = Math.min(contentStr.length(), 500);
> String summary = contentStr.substring(0, length);
>
> --- with: ---
>
> String summary = StringUtils.left(contentStr, 500);

-1 I'm not sure if that's worth the extra dependency to commons-lang.

> It's too bad we can't have a custom object...then we could have a
> getSummary() method that would do this so we don't run the risk of the
> summary getting out of sync with respect to the fulltext content.

I don't think we have any cases where the fulltext or summary would
change after parsing.

> Same for getValue() always being getValues().get(0).

Good idea, though I'd really like to replace the whole Content object
stuff with a different metadata mechanism.

BR,

Jukka Zitting

Mime
View raw message