nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] Commented: (NUTCH-634) Patch - Nutch - Hadoop 0.17.0
Date Tue, 01 Jul 2008 00:44:45 GMT


Andrzej Bialecki  commented on NUTCH-634:

I ran a test crawl using Hadoop 0.17.1 release, after applying the portions of this patch
without the OutputFormat and setting the property as above. The crawl succeeded with no problems.

If there are no further objections, I'd like to commit this patch with these changes within
a day or two.

> Patch - Nutch - Hadoop 0.17.0
> -----------------------------
>                 Key: NUTCH-634
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 0.9.0
>            Reporter: Michael Gottesman
>            Assignee: Andrzej Bialecki 
>             Fix For: 0.9.0
>         Attachments: diff, hadoop-0.17.patch, hadoop-0.17.patch
> This is a patch so that Nutch can be used with Hadoop 0.17.0. The patch is located at
> The patch compiles and passes all current Nutch unit tests.
> I have tested that the crawler side of Nutch (i.e. inject, generate, fetch, parse, merge
w/crawldb) definetly works, but have not tested the lucene indexing part. It might work, but
it might not. 
> *NOTE* - the two main bugs that had to be overcome were not noticed by any of the unit
tests. The bugs only came up during actual testing. The bugs were:
> 1. Changes to the Hadoop Iterator
> 2. Addition of Serialization to MapReduce Framework

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message