nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Antwort: Re: Why does TestNodeWalker keep failing?
Date Tue, 16 Jun 2009 11:13:16 GMT
marcel.schnippe@provinzial.com wrote:
> 
> Hi All,
> 
> According to W3C's Excessive DTD Traffic 
> <http://www.w3.org/2005/06/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic>we

> should not download any DTD, because 
> "_http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd_" denotes a
> namespace, not a ressource allthough it looks and works like an URI.
> 
>  > A while ago we put a system in place to monitor our servers for 
> abusive request patterns
>  > and send 503 Service Unavailable responses with custom text depending
>  > on the nature of the abuse. Our hope was that the authors of 
> misbehaving software and
>  > the administrators of sites who deployed it would notice these errors 
> and make the
>  > necessary fixes to the software responsible.
> 
>  >> To read the DTD, one might be able to use an alternate URL based on 
> the public identifier. Unfortunately, catalogs are not in wide-spread 
> use, and W3C does nothing to promote them.

Thanks Marcel, this confirms my suspicion.

The proper fix is to use a local copy of DTDs, and set an 
XMLCatalogResolver on every XML parser to access these local copies. An 
interim workaround for TestNodeWalker is to turn off validation and turn 
off loading of external entities - I verified that the test passes then.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message