nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Can Nutch process rel-tag likes rel="nofollow"?
Date Thu, 16 Aug 2012 08:20:43 GMT
Yes, this is supported in trunk and will still be supported when switching to Tika for outlink
extraction. Anchors with NOFOLLOW will simply be discarded.
 
 
-----Original message-----
> From:Lewis John Mcgibbney <lewis.mcgibbney@gmail.com>
> Sent: Thu 16-Aug-2012 10:12
> To: dev@nutch.apache.org
> Subject: Re: Can Nutch process rel-tag likes rel=&quot;nofollow&quot;?
> 
> Currently it looks we like don't have full support for such
> functionality. It is straight foward to grab the nofollow rel tag but
> the post processing is not currently implemented therefore you would
> need to do this yourself.
> 
> Lewis
> 
> On Thu, Aug 16, 2012 at 5:27 AM, weishenyun <wlx198834@yahoo.com.cn> wrote:
> > I know Nutch crawl the website according to Robot protocol if you make that
> > configuration. And it will not fetch and parse the link on the page which
> > contains <meta name="robots" content="nofollow">. But can Nutch process
> > rel-tag likes rel="nofollow" in the tags  ......  on the page?
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Can-Nutch-process-rel-tag-likes-rel-nofollow-tp4001541.html
> > Sent from the Nutch - Dev mailing list archive at Nabble.com.
> 
> 
> 
> -- 
> Lewis
> 

Mime
View raw message