nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Hosking (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-1760) Crawl script fails to find job file if called from outside bin dir
Date Thu, 17 Apr 2014 08:00:36 GMT
David Hosking created NUTCH-1760:
------------------------------------

             Summary: Crawl script fails to find job file if called from outside bin dir
                 Key: NUTCH-1760
                 URL: https://issues.apache.org/jira/browse/NUTCH-1760
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 2.2.1, 1.8
         Environment: Ubuntu 13.10 Server
            Reporter: David Hosking
            Priority: Minor


The crawl script that comes with all the version of Nutch I have checked set the local/distributed
operating mode using a relative path (i.e. "../*nutch-*.job").

Bash seems to be taking this as relative to the location that the crawl script was called
from, not the scripts actual location.

The result is that the script thinks it is in local mode because it cannot find the job file.
 When trying to carry out a crawl jobs are submitted to Hadoop properly, but ifs that test
for local (or not) mode fail and give strange results/result in crashes.

Using the first bash snippet from [here|https://stackoverflow.com/a/246128] I have modified
the crawl script to look for a job file relative to the script location on disk.

I have attached a patch with my modifications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message