nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alparslan Avcı (JIRA) <>
Subject [jira] [Commented] (NUTCH-1714) Nutch 2.x upgrade to Gora 0.4
Date Fri, 02 May 2014 10:05:16 GMT


Alparslan Avcı commented on NUTCH-1714:

Hi [~jnioche],

I do not know if you have tested the patch, but it fixes the problem with last update. The
reason for the _readdb_ problem is that it tries to get all fields from _webpage_ table, and
it uses _WebPage._ALL_FIELDS_ array to achieve this. However, this array also contains ___g__dirty_
field which is used to save dirty fields of the persistent class. This field is not stored
in database. Thus, when db is queried with this field, no results will be returned. 
In the patch I have removed ___g__dirty_ field directly from the fields sent to the query,
since it is always at the first positon of the __ALL_FIELDS_ array. This will fix the problem.
However, I will also send a mail to dev@gora and discuss if we should remove this field from
persistent class' __ALL_FIELDS_ array. Then, we can use _WebPage._ALL_FIELDS_ directly in

Thanks for your comments.

> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>                 Key: NUTCH-1714
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Alparslan Avcı
>            Assignee: Alparslan Avcı
>             Fix For: 2.3
>         Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, NUTCH-1714v2.patch,
NUTCH-1714v4.patch, NUTCH-1714v5.patch
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the details in
this issue.

This message was sent by Atlassian JIRA

View raw message