nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (JIRA)" <>
Subject [jira] [Commented] (NUTCH-1714) Nutch 2.x upgrade to Gora 0.4
Date Fri, 02 May 2014 11:23:14 GMT


Julien Nioche commented on NUTCH-1714:

bq. I do not know if you have tested the patch, but it fixes the problem with last update.

I did test it (hence my assertion that it did not work) but must have done something wrong,
which is not surprising given that I had various patches on the code. I tried again from a
clean copy of the repo and it solves the issue indeed. Thanks

bq.  The reason for the readdb problem is that it tries to get all fields from webpage table,
and it uses WebPage._ALL_FIELDS array to achieve this. However, this array also contains __gdirty
field which is used to save dirty fields of the persistent class. This field is not stored
in database. Thus, when db is queried with this field, no results will be returned. 

Thanks for the explanation

bq. In the patch I have removed __gdirty field directly from the fields sent to the query,
since it is always at the first positon of the _ALL_FIELDS array. This will fix the problem.
However, I will also send a mail to dev@gora and discuss if we should remove this field from
persistent class' _ALL_FIELDS array. Then, we can use WebPage._ALL_FIELDS directly in here.

Good idea.

I will comment about the filtering on NUTCH-1674 and do more testing before I commit this

Thanks for your work!


> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>                 Key: NUTCH-1714
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Alparslan Avcı
>            Assignee: Alparslan Avcı
>             Fix For: 2.3
>         Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch, NUTCH-1714v2.patch,
NUTCH-1714v4.patch, NUTCH-1714v5.patch
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the details in
this issue.

This message was sent by Atlassian JIRA

View raw message