nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christophe Noel <christophe.n...@cetic.be>
Subject urgent please help - bug
Date Fri, 04 Mar 2005 10:26:32 GMT
Hello,

Please all suggestions are welcome !
I need a fresh crawl to show on monday but for a while, I always get 
this error after about one hour of crawling with

bin/nutch crawl urls -dir dir -depth 50 -threads 50

050302 183540 Updating /nutch-0.6/agoria.2mar/db
050302 183540 Updating for  /nutch-0.6/agoria.2mar/segments/20050302183116
050302 183540 Processing document 0
050302 183541 Finishing update
050302 183542 Processing pagesByURL: Sorted 2931 instructions in 0.915  
seconds.
050302 183542 Processing pagesByURL: Sorted 3203.27868852459  
instructions/second
Exception in thread "main" java.io.IOException: already exists:  
/nutch-0.6/agoria.2mar/db/webdb.new/pagesByURL
       at net.nutch.io.MapFile$Writer.<init>(MapFile.java:67)
       at  
net.nutch.db.WebDBWriter$CloseProcessor.closeDown(WebDBWriter.java: 536)
       at net.nutch.db.WebDBWriter.close(WebDBWriter.java:1531)
       at  
net.nutch.tools.UpdateDatabaseTool.close(UpdateDatabaseTool.java:301)
       at  
net.nutch.tools.UpdateDatabaseTool.main(UpdateDatabaseTool.java:351)
       at net.nutch.tools.CrawlTool.main(CrawlTool.java:128)

Thanks.

Christophe.

Mime
View raw message