tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (TIKA-126) Add Parser.parse(InputStream, Metadata) for metadata extraction
Date Sun, 14 Sep 2008 15:43:44 GMT

     [ https://issues.apache.org/jira/browse/TIKA-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jukka Zitting reopened TIKA-126:

I'm having second thoughts about this feature. It sounds useful, but few parsers can easily
implement this without parsing the full document in any case, so the actual performance benefits
are questionable. The downside of this issue is that it adds extra complexity to the otherwise
clear and simple Parser interface.

I'm inclined to revert these changes for now, and perhaps revisit the issue when we have a
more pressing use case for an extra parsing mode like this.

> Add Parser.parse(InputStream, Metadata) for metadata extraction
> ---------------------------------------------------------------
>                 Key: TIKA-126
>                 URL: https://issues.apache.org/jira/browse/TIKA-126
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.2-incubating
> In some cases a client is just interested in the parsed metadata and not the extracted
text content. It is easy to ignore the text content by just passing a dummy DefaultHandler
to the existing parse() method, but many parsers could avoid a lot of work if they knew in
advance that the text content is not needed.
> Thus I want to add a parse(InputStream, Metadata) signature to the Parser interface.
I'll also add an AbstractParser base class with a trivial implementation of that method:
>     public abstract AbstractParser implements Parser {
>         public void parse(InputStream stream, Metadata metadata) {
>             parse(stream, new DefaultHandler(), metadata);
>         }
>     }

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message