nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Gottesman <gotte...@reed.edu>
Subject Patch Nutch -> Hadoop .17
Date Tue, 27 May 2008 18:53:31 GMT
Hello. I am currently developing a patch so that Nutch can be used as a 
job jar in a hadoop .17 framework. The task turned out to not be that 
complicated, just involving updating certain deprecated methods that 
were removed in hadoop .17 and parameterizing certain methods and 
classes. So the diff is not that long. If you could give me some 
advice/hints on the following it would be much appreciated since I would 
then be able to go and finish the task and submit it to JIRA as a patch:

Basically the build compiles but still breaks two unit tests which we 
can not seem to find the cause of. They are:

    * TestCrawlDbMerger.java
    * TestDeleteDuplicates.java

I have tracked down the bug in TestCrawlDbMerger to a difference in 
fetchTimes in Url10 and Url20. The resultant is continually 10 seconds 
behind the expected.

I have not had as much of an opportunity to examine why 
TestDeleteDuplicates fails.

The diff of my changes are at this address <http://pastie.caboo.se/204167>.

Thank you so much in advance,

Michael

Mime
View raw message