nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Yang <jun...@gmail.com>
Subject parsing a simple text node
Date Tue, 08 Feb 2011 08:16:50 GMT
Hi there,

i am working on a plugin to fetch some structured information (e.g., product
price) in web pages, and I had some problem parsing the following simple
node:

<span class="product-price-amount">
             $27.00</span>

The parser first got the Node for "span", which has only one child node as a
text Node. I would assume this text Node has value "$27.00", but when I
called getNodeValue() the return value is empty. I forced this child node to
be Text node and called getWholeText() but still get empty return value.

Does anyone know what's going on? It seems that the text "$27.00" seems to
be missing from the whole hierarchy.

Jun

Mime
View raw message