cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Flynn <>
Subject XHTML via Tidy not making it into XSLT
Date Fri, 23 Oct 2009 11:26:28 GMT
I have a resource in my sitemap which makes a web page available as XHTML:

> <map:match pattern="fetch/**">
>   <map:generate src="http://{1}" type="html"/>
>   <map:transform src="xsl/as-is.xsl"/>
>   <map:serialize type="xhtml"/>
> </map:match>

I call this from within another XSLT file so that I can screenscrape the 
document for a specific element type by ensuring that it is Tidy'd to 
XHTML first. The as-is.xsl is a plain identity transform to match "*". 
Ugly, but useful (there must be a more elegant way but I haven't found 
it). In the second XSLT file I have a match for an element type which 
holds the desired URI in an attribute:

> <xsl:apply-templates 
>      select="document(concat('http://myserver/fetch/',@site))//
>              descendant::html:div[@class='foo']"/>

Constructing the URI and issuing it by hand from the terminal with curl, 
wget, dog, etc works fine, and the resulting XHTML file works (tested 
with lxgrep to ensure that the XPath extracts the right element), so I 
know that bit works.

When accessed from within the second stylesheet, the cocoon.log shows 
Tidy successfully converting the remote page to XHTML, the same as when 
tested from the terminal, but the data never makes it through to the 
template for html:div (the namespace *is* specified in the stylesheet 
:-) In cocoon.log there's a warning:

> WARN  (2009-10-23) 11:34.02:162 [sitemap.transformer.xslt] (/doc/test) TP-Processor9/TraxErrorListener:

but it doesn't say what it found wrong (not very helpful). Line 7 of 
tools.xsl is the apply-templates shown above, char 138 is the end of 
that line.

Testing it from the command line with Saxon, I get this:

> Recoverable error on line 7 of file:/xsl/tools.xsl:
>   FODC0005: Server returned HTTP response code: 503 for URL:

503 is a temporary overload, but that URI is retrievable with curl the 
instant before and after using Saxon. And in any case, when going via 
Cocoon it would cache the DTD (wouldn't it? to avoid overloading the W3C 
with a gazillion requests for the DTD URI?)

I'm missing a trick here, but I can't see what.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message