tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject HtmlMapper
Date Sun, 13 Dec 2009 21:13:54 GMT

See TIKA-347 for a nice alternative to the earlier TIKA-304 approach
to customizing the way Tika maps incoming HTML to XHTML.

You can now inject a custom mapping strategy through the parse
context, like this:

    Parser parser = ...;
    ParseContext context = new ParseContext();
    context.set(HtmlMapper.class, new MyCustomHtmlMapper())
    parser.parse(..., context);

The new HtmlMapper interface contains the same mapSafeElement() and
isDiscardElement() method signatures that we already used for the
overridable HtmlParser methods in TIKA-304. If a custom HtmlMapper
instance is not found in the parse context, then the existing TIKA-304
mechanism is used.


Jukka Zitting

View raw message