Thank you for this example! Is there any chance this example could be
added to the Tika wiki?
On Fri, Jul 16, 2010 at 1:30 AM, Jukka Zitting <jukka.zitting@gmail.com>wrote:
> Hi,
>
> On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik <paul@purediscovery.com>
> wrote:
> > On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting <jukka.zitting@gmail.com
> >wrote:
> >> The way I recommend is to pass a custom Parser implementation through
> >> the ParseContext. This gives you detailed access to each component
> >> document.
> >
> > I looked at the code a little further, and I don't see exactly how I can
> do
> > this.
>
> Looks like you're approaching this from the wrong perspective. See the
> example below (or at http://pastebin.com/ZNfCQ9bk) for a recursive
> depth-first traversal that prints out the metadata of all the
> component documents.
>
> public static void main(String[] args) throws Exception {
> Parser parser = new RecursiveMetadataParser(new AutoDetectParser());
> ParseContext context = new ParseContext();
> context.set(Parser.class, parser);
>
> ContentHandler handler = new DefaultHandler();
> Metadata metadata = new Metadata();
>
> InputStream stream = TikaInputStream.get(new File(args[0]));
> try {
> parser.parse(stream, handler, metadata, context);
> } finally {
> stream.close();
> }
> }
>
> private static class RecursiveMetadataParser extends ParserDecorator {
>
> public RecursiveMetadataParser(Parser parser) {
> super(parser);
> }
>
> @Override
> public void parse(
> InputStream stream, ContentHandler handler,
> Metadata metadata, ParseContext context)
> throws IOException, SAXException, TikaException {
> super.parse(stream, handler, metadata, context);
>
> System.out.println("----");
> System.out.println(metadata);
> }
>
> }
>
> BR,
>
> Jukka Zitting
>
|