nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tejas Patil (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch
Date Sat, 04 Jan 2014 07:44:51 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862237#comment-13862237
] 

Tejas Patil commented on NUTCH-1465:
------------------------------------

Hi [~wastl-nagel],
Yes. I think that it should be there too. I will be working on the patch this weekend and
update on the same. Thanks for your inputs and suggestions till now in, were super helpful
in chalking out the right specs for this feature.

> Support sitemaps in Nutch
> -------------------------
>
>                 Key: NUTCH-1465
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1465
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Lewis John McGibbney
>            Assignee: Tejas Patil
>             Fix For: 1.9
>
>         Attachments: NUTCH-1465-sitemapinjector-trunk-v1.patch, NUTCH-1465-trunk.v1.patch
>
>
> I recently came across this rather stagnant codebase[0] which is ASL v2.0 licensed and
appears to have been used successfully to parse sitemaps as per the discussion here[1].
> [0] http://sourceforge.net/projects/sitemap-parser/
> [1] http://lucene.472066.n3.nabble.com/Support-for-Sitemap-Protocol-and-Canonical-URLs-td630060.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message