nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <ferdy.gal...@kalooga.com>
Subject Re: DbUpdateReducer could not mark it's batchid
Date Wed, 15 Aug 2012 12:12:23 GMT
Hi,

This bug was already remarked some posts ago on the mailing list, but
thanks anyway for reporting.

I have created issue for keeping track:
https://issues.apache.org/jira/browse/NUTCH-1456

Ferdy.

On Wed, Aug 15, 2012 at 1:59 PM, lin weijian <linweijian8@gmail.com> wrote:

>         Hi,
>         i find a bug in nutch 2.0, which causes  Mark.UPDATEDB_MARK could
> not mark it's bat chid.
>
>         Here in org.apache.nutch.crawl.DbUpdateReducer.java ,  reduce
> function:
>
>         Mark.GENERATE_MARK.removeMarkIfExist(page);
>     Mark.FETCH_MARK.removeMarkIfExist(page);
>     Utf8 mark = Mark.PARSE_MARK.removeMarkIfExist(page);
>     if (mark != null) {
>       Mark.UPDATEDB_MARK.putMark(page, mark);
>     }
>
>     it clear the generate, fetch & parse bat chid, and set updated bat
> chid,
>     but Mark.UPDATEDB_MARK.putMark(page, mark) could not execute, because
>     mark is always null.
>
>     In gora 0.2, the remove function of StatefulHashMap ,which is called
>      by WebPage's Markers always return null.
>
>
>     Thanks.
>

Mime
View raw message