nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Amir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-413) Fetcher ignores -noParsing command line option
Date Fri, 08 Dec 2006 15:12:22 GMT
    [ http://issues.apache.org/jira/browse/NUTCH-413?page=comments#action_12456870 ] 
            
Jonathan Amir commented on NUTCH-413:
-------------------------------------

I didn't check out the trunk, I checked out the 0.8.1 tag, because I wanted stability. If
it is fixed in the trunk, then I guess you can close this issue.
By the way, I wouldn't assume that nutch-site overrides command line options - if it does,
then it is wrong. It should be the other way around - command line options should override
nutch-site.

> Fetcher ignores -noParsing command line option
> ----------------------------------------------
>
>                 Key: NUTCH-413
>                 URL: http://issues.apache.org/jira/browse/NUTCH-413
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: Fedora Core 6, nutch 0.8.1
>            Reporter: Jonathan Amir
>
> I believe that the patch applied in NUTCH-337 broke the fetcher. Now the fetcher ignores
the -noParsing command-line option - the parsing occurs anyway.
> To the best of my understanding of nutch, I managed to trace the problem as follows in
the code:
> In fetcher class, in line 473, -noParsing is evaluted properly and placed into a Configuration
created by NutchConfiguartion.create(). So far so good.
> In the same file, in line 280, the decision whether to parse or not depends on local
field "parsing". During execution, this fields value is true, instead of false. This field
is set to true by method "configure", in line 357. The problem is that method "configure"
accepts a JobConf as a parameter, but the actual JobConf object that is passed to it is not
the one used previously in line 473.
> The one that is actually passed to configure is a different object. I think it is created
in line 422, but I am not sure about it.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message