nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Withanage, Dulip" <withan...@asia-europe.uni-heidelberg.de>
Subject Java Heap Limit Exceeded
Date Mon, 25 Jan 2010 08:32:57 GMT
Dear developers,

I have installed a nutch system on a linux enterprise server with 8GB RAM.
My JAVA VM has 4GB RAM, when nutch starts.

I have configured a web-crawler to scan pdf documents (abour 3000) in intranet.
After about 100 PDF docs, there is always a outOfMemory Exception.

I tried following trick.

In idex.html, I generate links to a set of  html links. (link1.html, liknk2.html etc..) 
Each link.html has a link to 20 PDFS. But this trick also fails.

Can someone give some idea or a place to read?


Best regards,

Dulip Withanage, M.Sc 


Cluster of Excellence 
Karl Jaspers Centre
Heidelberg

Fax: +49-6221 - 54 4012
e-mail: withanage@asia-europe.uni-heidelberg.de



Mime
View raw message