nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <lists.digitalpeb...@gmail.com>
Subject Re: parsing a simple text node
Date Tue, 08 Feb 2011 10:15:43 GMT
Hi Jun,

Which version of Nutch are you using and which parser? parse-html or
parse-tika?

julien

On 8 February 2011 08:16, Jun Yang <juny78@gmail.com> wrote:

> Hi there,
>
> i am working on a plugin to fetch some structured information (e.g.,
> product price) in web pages, and I had some problem parsing the following
> simple node:
>
> <span class="product-price-amount">
>
>              $27.00</span>
>
> The parser first got the Node for "span", which has only one child node as
> a text Node. I would assume this text Node has value "$27.00", but when I
> called getNodeValue() the return value is empty. I forced this child node to
> be Text node and called getWholeText() but still get empty return value.
>
> Does anyone know what's going on? It seems that the text "$27.00" seems to
> be missing from the whole hierarchy.
>
> Jun
>
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Mime
View raw message