tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Jakubik <p...@purediscovery.com>
Subject Re: Packages and attributes
Date Fri, 16 Jul 2010 15:29:02 GMT
Thank you for this example! Is there any chance this example could be
added to the Tika wiki?

On Fri, Jul 16, 2010 at 1:30 AM, Jukka Zitting <jukka.zitting@gmail.com>wrote:

> Hi,
>
> On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik <paul@purediscovery.com>
> wrote:
> > On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting <jukka.zitting@gmail.com
> >wrote:
> >> The way I recommend is to pass a custom Parser implementation through
> >> the ParseContext. This gives you detailed access to each component
> >> document.
> >
> > I looked at the code a little further, and I don't see exactly how I can
> do
> > this.
>
> Looks like you're approaching this from the wrong perspective. See the
> example below (or at http://pastebin.com/ZNfCQ9bk) for a recursive
> depth-first traversal that prints out the metadata of all the
> component documents.
>
>    public static void main(String[] args) throws Exception {
>        Parser parser = new RecursiveMetadataParser(new AutoDetectParser());
>        ParseContext context = new ParseContext();
>        context.set(Parser.class, parser);
>
>        ContentHandler handler = new DefaultHandler();
>        Metadata metadata = new Metadata();
>
>        InputStream stream = TikaInputStream.get(new File(args[0]));
>        try {
>            parser.parse(stream, handler, metadata, context);
>        } finally {
>            stream.close();
>        }
>    }
>
>    private static class RecursiveMetadataParser extends ParserDecorator {
>
>        public RecursiveMetadataParser(Parser parser) {
>            super(parser);
>        }
>
>        @Override
>        public void parse(
>                InputStream stream, ContentHandler handler,
>                Metadata metadata, ParseContext context)
>                throws IOException, SAXException, TikaException {
>            super.parse(stream, handler, metadata, context);
>
>            System.out.println("----");
>            System.out.println(metadata);
>        }
>
>    }
>
> BR,
>
> Jukka Zitting
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message