nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank McCown <>
Subject Support for Sitemap Protocol and Canonical URLs
Date Mon, 16 Feb 2009 17:28:55 GMT
I'm teaching a search engine course for CS undergrads, and we'd like
to make a contribution to Nutch.  It appears that Nutch does not
support the Sitemap Protocol (NUTCH-158).

So I wanted to check with you all and see if this is something you
think would make a good addition.  Also, do you think this would be a
good project for a team of 3 undergrad students who need to complete
it within 2-3 weeks?  Being only modestly familiar with the codebase
myself, I don't want to assign a project that would be too difficult
or overwhelming for undergraduates who are newbies and have only been
writing Java code for a few semesters.

Also you may have heard of the new rel="canonical" attribute which is
now being supported by Google, Yahoo, and Live:

I'd like my students to modify Nutch to support this new attribute as well.

After I get some feedback, I'll submit a request to JIRA.  I was
wondering though, would it be better to submit it as an issue for 0.9,
1.0, or 1.1?


Frank McCown, Ph.D.
Assistant Professor of Computer Science
Harding University

View raw message