nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cihad güzel (JIRA) <>
Subject [jira] [Updated] (NUTCH-1741) Support of Sitemaps in Nutch 2.x
Date Thu, 20 Aug 2015 20:17:45 GMT


cihad güzel updated NUTCH-1741:
    Attachment: NUTCH-1741-v4.patch


I have uploaded new patch as NUTCH-1741-v4.patch. 
This is the last patch for GSOC working for now. The patch have sitemap crawler code and testing
code. Please review it and tell your opinion.

I am going to prepare final documentation for sitemap crawler and GSOC working at the following
week. I'll talk about how to run sitemap crawler on the document. I'll tell you how to use
sitemap crawler in the document. [~talat] [~lewismc] Thanks

> Support of Sitemaps in Nutch 2.x
> --------------------------------
>                 Key: NUTCH-1741
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher, generator
>            Reporter: Alparslan Avcı
>              Labels: gsoc2015
>             Fix For: 2.4
>         Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch, NUTCH-1741-v4.patch, NUTCH-1741.patch,
SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
> Sitemap support has to be implemented for 2.x branch. It is being discussed in NUTCH-1465
for trunk. 

This message was sent by Atlassian JIRA

View raw message