nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alfonso Nishikawa (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (NUTCH-1741) Support of Sitemaps in Nutch 2.x
Date Sat, 08 Oct 2016 17:47:21 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558401#comment-15558401
] 

Alfonso Nishikawa edited comment on NUTCH-1741 at 10/8/16 5:46 PM:
-------------------------------------------------------------------

Attached a proposed patch for webpage.avsc ([^NUTCH-1741-webpage-avsc.patch]). 

I suspect the creator of the final patch pressed backspace or moved some bracket unnoticed
just before creating NUTCH-1741v7.patch, since the Persistent WebPage.SCHEMA$ has the right
schema:

If you take a look at the schema of the version in the repository atm [1], near the end it
shows:
{code}
\"default\":{}},{\"name\":\"stmPriority\"
{code}

But the schema definition webpage.avsc at [2] shows:

{code}
      "default": {

      },
      {
        "name": "stmPriority",
{code}

The patch just fixes de schema, but no recompilation should be needed.

I use HBase but in a personalized Nutch to support own GORA-0.7-SNAPSHOT.

[1] - https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/java/org/apache/nutch/storage/WebPage.java#L31

[2] - https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/gora/webpage.avsc#L294


was (Author: alfonso.nishikawa):
Attached a proposed patch for webpage.avsc. 

I suspect the creator of the final patch pressed backspace or moved some bracket unnoticed
just before creating NUTCH-1741v7.patch, since the Persistent WebPage.SCHEMA$ has the right
schema:

If you take a look at the schema of the version in the repository atm [1], near the end it
shows:
{code}
\"default\":{}},{\"name\":\"stmPriority\"
{code}

But the schema definition webpage.avsc at [2] shows:

{code}
      "default": {

      },
      {
        "name": "stmPriority",
{code}

The patch just fixes de schema, but no recompilation should be needed.

I use HBase but in a personalized Nutch to support own GORA-0.7-SNAPSHOT.

[1] - https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/java/org/apache/nutch/storage/WebPage.java#L31

[2] - https://github.com/apache/nutch/blob/ffa04e1b4b11d17109e870e73ed34f64e9e2c2ef/src/gora/webpage.avsc#L294

> Support of Sitemaps in Nutch 2.x
> --------------------------------
>
>                 Key: NUTCH-1741
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1741
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher, generator
>            Reporter: Alparslan Avc─▒
>            Assignee: Cihad Guzel
>              Labels: gsoc2015
>             Fix For: 2.4
>
>         Attachments: NUTCH-1741-v2.patch, NUTCH-1741-v3.patch, NUTCH-1741-v4.patch, NUTCH-1741-webpage-avsc.patch,
NUTCH-1741.patch, NUTCH-1741v5.patch, NUTCH-1741v6.patch, NUTCH-1741v7.patch, SitemapCrawlerLifeCycle.pdf,
SitemapDevelopmentFor2x.pdf
>
>
> Sitemap support has to be implemented for 2.x branch. It is being discussed in NUTCH-1465
for trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message