nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sami Siren (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (NUTCH-428) NullPointerException
Date Fri, 12 Jan 2007 22:16:27 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sami Siren resolved NUTCH-428.
------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9.0

Most propably you dont have agent name configured in nutch-site.xml. I changed this situation
to emit RuntimeException in trunk instead so it's easier to diagnose.

> NullPointerException
> --------------------
>
>                 Key: NUTCH-428
>                 URL: https://issues.apache.org/jira/browse/NUTCH-428
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8.1
>         Environment: Windows XP
>            Reporter: Piyush
>             Fix For: 0.9.0
>
>
> I am using the NUTCH.Bat provided in one one of the thread. (i am not using CYGWIN) Whenever
I try to fetch the Item, I am getting fetching failed "nullpointerexception" 
> I have a URL Directory. which has urls.txt file. there is only one entry in the file
which is http://www.winzip.com/land_about.htm. 
> I have updated the crawl-urlfilter.txt with +^http://www.winzip.com/. 
> Is there any other settings I am missing?? Any help is greatly appreciated. 
> The command i used to  start the crawl is 
> nutch  crawl urls -dir crawlResults -depth 1
> Here is my log 
> crawl started in: crawlResult
> rootUrlDir = urls
> threads = 10
> depth = 1
> Injector: starting
> Injector: crawlDb: crawlResult/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: starting
> Generator: segment: crawlResult/segments/20070110085314
> Generator: Selecting best-scoring urls due for fetch.
> Generator: Partitioning selected urls by host, for politeness.
> Generator: done.
> Fetcher: starting
> Fetcher: segment: crawlResult/segments/20070110085314
> Fetcher: threads: 10
> fetching http://www.winzip.com/land_about.htm
> fetch of http://www.winzip.com/land_about.htm failed with: java.lang.NullPointerException
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: crawlResult/crawldb
> CrawlDb update: segment: crawlResult/segments/20070110085314
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> LinkDb: starting
> LinkDb: linkdb: crawlResult/linkdb
> LinkDb: adding segment: crawlResult/segments/20070110085314
> LinkDb: done
> Indexer: starting
> Indexer: linkdb: crawlResult/linkdb
> Indexer: adding segment: crawlResult/segments/20070110085314
> Optimizing index.
> Indexer: done
> Dedup: starting
> Dedup: adding indexes in: crawlResult/indexes
> Dedup: done
> Adding crawlResult/indexes/part-00000
> crawl finished: crawlResult
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message