tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Question about XPath Matcher code & MatchingContentHandler
Date Mon, 03 Sep 2012 15:02:59 GMT

On Thu, Aug 30, 2012 at 7:35 PM, Ken Krugler
<kkrugler_lists@transpac.com> wrote:
> The issue is that BodyContentHandler uses MatchingContentHandler to find only
> text in nodes under the /html/body hierarchy.
> And this in turn winds up not matching the <html> element.

That's as intented, as the BodyContentHandler is only interested in
stuff inside the <body> element, not outside it.


Jukka Zitting

View raw message