nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Yang <>
Subject Re: parsing a simple text node
Date Wed, 09 Feb 2011 09:07:33 GMT
Hi Evert,

Thanks for reply. Actually I left out some details in my original email.

If I looked at it through firebug, the element looks like:
<span class="product-price-amount" style="visibility: visible; opacity: 1;">
   <cufon class="cufon cufon-canvas" alt="$27.00" style="width: 101px;
height: 20px;">
       <canvas width="126" height="25" style="width: 126px; height: 25px;
top: -3px; left: -2px;">

But when I looked at it through "VIew Source", it becomes:

<span class="product-price-amount">


When I passed it, it looks like I am parsing the second one (I cannot get
<cufon> node at all).

Does this mean it's dynamically generated by JS?


On Tue, Feb 8, 2011 at 3:57 AM, Evert Wagenaar <>wrote:

> Hi Jun,
> Could it be that the price is set by JavaScript at the moment of display in
> your browser? In that case the price is actually in some datasource (xml) or
> a separate .js file. This is sometimes done when pages need to be displayed
> in several browses like iPhone's and regular browsers.
> Did you try using an XPath expression? in your case it would be
> //span@product-price-amount. There are some good firefox addons to test
> XPaths on HTML. I use XPather.
> Regards,
> Evert
> ------------------------------
> *Van: *"Jun Yang" <>
> *Aan: *
> *Verzonden: *Dinsdag 8 februari 2011 09:16:50
> *Onderwerp: *parsing a simple text node
> Hi there,
> i am working on a plugin to fetch some structured information (e.g.,
> product price) in web pages, and I had some problem parsing the following
> simple node:
> <span class="product-price-amount">
>              $27.00</span>
> The parser first got the Node for "span", which has only one child node as
> a text Node. I would assume this text Node has value "$27.00", but when I
> called getNodeValue() the return value is empty. I forced this child node to
> be Text node and called getWholeText() but still get empty return value.
> Does anyone know what's going on? It seems that the text "$27.00" seems to
> be missing from the whole hierarchy.
> Jun

View raw message