nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alfonso Nishikawa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1741) Support of Sitemaps in Nutch 2.x
Date Sat, 08 Oct 2016 14:45:21 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558096#comment-15558096
] 

Alfonso Nishikawa commented on NUTCH-1741:
------------------------------------------

I believe webpage.avsc is wrong in this patch. It should be:

{code}
    {
      "name": "sitemaps",
      "type": {
        "type": "map",
        "values": [
          "null",
          "string"
        ]
      },
      "doc": "Sitemap urls in robot.txt",
      "default": {} <---
    }, <-----------------
    {
      "name": "stmPriority",
      "type": "float",
      "doc": "",
      "default": 0
    }
{code}

In WebPage.SCHEMA$ is correct.

> Support of Sitemaps in Nutch 2.x
> --------------------------------
>
>                 Key: NUTCH-1741
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1741
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher, generator
>            Reporter: Alparslan Avc─▒
>            Assignee: Cihad Guzel
>              Labels: gsoc2015
>             Fix For: 2.4
>
>         Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch, NUTCH-1741-v4.patch, NUTCH-1741.patch,
NUTCH-1741v5.patch, NUTCH-1741v6.patch, NUTCH-1741v7.patch, SitemapCrawlerLifeCycle.pdf, SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed in NUTCH-1465
for trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message