tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Packages and attributes
Date Fri, 16 Jul 2010 07:30:41 GMT
Hi,

On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik <paul@purediscovery.com> wrote:
> On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting <jukka.zitting@gmail.com>wrote:
>> The way I recommend is to pass a custom Parser implementation through
>> the ParseContext. This gives you detailed access to each component
>> document.
>
> I looked at the code a little further, and I don't see exactly how I can do
> this.

Looks like you're approaching this from the wrong perspective. See the
example below (or at http://pastebin.com/ZNfCQ9bk) for a recursive
depth-first traversal that prints out the metadata of all the
component documents.

    public static void main(String[] args) throws Exception {
        Parser parser = new RecursiveMetadataParser(new AutoDetectParser());
        ParseContext context = new ParseContext();
        context.set(Parser.class, parser);

        ContentHandler handler = new DefaultHandler();
        Metadata metadata = new Metadata();

        InputStream stream = TikaInputStream.get(new File(args[0]));
        try {
            parser.parse(stream, handler, metadata, context);
        } finally {
            stream.close();
        }
    }

    private static class RecursiveMetadataParser extends ParserDecorator {

        public RecursiveMetadataParser(Parser parser) {
            super(parser);
        }

        @Override
        public void parse(
                InputStream stream, ContentHandler handler,
                Metadata metadata, ParseContext context)
                throws IOException, SAXException, TikaException {
            super.parse(stream, handler, metadata, context);

            System.out.println("----");
            System.out.println(metadata);
        }

    }

BR,

Jukka Zitting

Mime
View raw message