nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MyD <myd.ro...@googlemail.com>
Subject Nutch Topical / Focused Crawl
Date Thu, 02 Apr 2009 13:12:51 GMT
Hi @ all,

I'd like to turn Nutch into an focused / topical crawler. It's a part  
of my final year thesis. Further, I'd like that others can contribute  
from my work. I started to analyze the code and think that I found the  
right peace of code. I just wanted to know if I am on the right track.  
I think the right peace of code to implement a decision to fetch  
further is in the method output of the Fetcher class every time we  
call the collect method of the OutputCollector object.

private ParseStatus output(Text key, CrawlDatum datum, Content content,
ProtocolStatus pstatus, int status) {
...
output.collect(...);
...
}

Would you mind to let me know the the best way to turn this decision  
into an plugin? I was thinking to go a similar way like the scoring  
filters. Thanks in advance.

Cheers,
MyD
Mime
View raw message