nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Massimo Miccoli <mmicc...@iltrovatore.it>
Subject howto skip hiddens ulrs inside div tag?
Date Tue, 06 Sep 2005 09:46:06 GMT
Hi nutch dev,

After fetching about 100 mio of pages I see many search engine spammers
that use an hidden div tag (negative position) to include many urls
that user don't see whe acces the site page. This links alter the boost
(by inlink count) so I want to skip this urls.
How can I do that?

Thanks,

Massimo


Mime
View raw message