nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alparslan Avcı (JIRA) <>
Subject [jira] [Updated] (NUTCH-1714) Nutch 2.x upgrade to Gora 0.4
Date Tue, 13 May 2014 06:26:15 GMT


Alparslan Avcı updated NUTCH-1714:

    Attachment: NUTCH-1714v6.patch

Hi [~jnioche],

Firstly, sorry about late answer and thanks for your comments!

bq. The code has changed since the last patch and we are now getting :
bq. This is due to status.getArgs() returning null.

I have fixed these in the [new patch|^NUTCH-1714v6.patch] I added.

bq. I presume you added the methods mentioned in NUTCH-1709 by hand after generating the classes
Yes, as you've guessed I've changed them by hand after generating automatically.

bq. WebTableReader should also remove the dirty field in processDumpJob
I have also fixed this in the new patch I added.
* the Generator marks 50K entries with GENERATE_MARK but the Fetcher shows only 49,461 as
Map Input Records (and the same number as Reduce input records) => looks like we are not
getting all the records we should be getting. I dumped the content of the table pre-fetching
and it contains the right number of entries i.e. 50K
* The Generator displayed 'generated batch id: 1399626659-15643 containing 0 URLs' but as
I just explained it marked 50K entries correctly
* The dump of the webtable contains 'markers:	org.apache.gora.persistency.impl.DirtyMapWrapper@eb173c'.
It should display the values correctly.
I will have look into these issues as soon as possible. Thanks again!

> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>                 Key: NUTCH-1714
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Alparslan Avcı
>            Assignee: Alparslan Avcı
>             Fix For: 2.3
>         Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, NUTCH-1714v2.patch,
NUTCH-1714v4.patch, NUTCH-1714v5.patch, NUTCH-1714v6.patch
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the details in
this issue.

This message was sent by Atlassian JIRA

View raw message