nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "AJ Chen" <cano...@gmail.com>
Subject need help to speed up map-reduce
Date Mon, 06 Nov 2006 21:34:56 GMT
Sorry for repeating this question. But, I have to find a solution, otherwise
the crawling is too slow to be practical.  I'm using nutch 0.9-dev on one
linux server to crawl millions of pages.  The fetching itself is reasonable,
but the map-reduce operations is killing the performance. For example,
fetching takes 10 hours and map-reduce also takes 10 hours, which makes the
overall performance very slow. Can anyone share experience on how to speed
up map-reduce for single server crawling?  Single server uses local file
system. It should spend very little time in doing map and reduce, isn't it
right?

Thanks,
-- 
AJ Chen, PhD
http://web2express.org

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message