nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "congliu (JIRA)" <>
Subject [jira] Created: (NUTCH-925) plugins stored in weakhashmap lead memory leak
Date Sat, 23 Oct 2010 09:36:19 GMT
plugins stored in weakhashmap lead memory leak

                 Key: NUTCH-925
             Project: Nutch
          Issue Type: Bug
    Affects Versions: 1.2
            Reporter: congliu

I suffer serious memory leak using Nutch 1.2 though a very deep crawl. I get the error like

Exception in thread "Thread-113544" java.lang.OutOfMemoryError: PermGen space
	at java.lang.Throwable.getStackTraceElement(Native Method)
	at java.lang.Throwable.getOurStackTrace(
	at java.lang.Throwable.printStackTrace(
	at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(
	at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(
	at org.apache.log4j.WriterAppender.subAppend(
	at org.apache.log4j.DailyRollingFileAppender.subAppend(
	at org.apache.log4j.WriterAppender.append(
	at org.apache.log4j.AppenderSkeleton.doAppend(
	at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(
	at org.apache.log4j.Category.callAppenders(
	at org.apache.log4j.Category.forcedLog(
	at org.apache.log4j.Category.log(
	at org.slf4j.impl.Log4jLoggerAdapter.log(
	at org.apache.commons.logging.impl.SLF4JLocationAwareLog.warn(
	at org.apache.hadoop.mapred.LocalJobRunner$
Exception in thread "main" Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(
	at org.apache.nutch.fetcher.Fetcher.fetch(
	at org.apache.nutch.crawl.Crawl.main(

I guess Plugin repository cache lead to memory leak.

As u know plugins is stored in weakhashmap <conf, plugins>, and new class classload
create when u need plugins.
Usually,WeakHashMap object can been gc, but class and classload is stored in Perm NOT stack
and gc can't perform in Perm, SO (java.lang.OutOfMemoryError: PermGen space) occured..., is
any nutch-issues have concerned this promble? or there is any solution? 

nutch-356 may help?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message