nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Padiasek (JIRA)" <>
Subject [jira] [Created] (NUTCH-1746) OutOfMemory Error in Mappers
Date Thu, 03 Apr 2014 13:58:15 GMT
Greg Padiasek created NUTCH-1746:

             Summary: OutOfMemory Error in Mappers
                 Key: NUTCH-1746
             Project: Nutch
          Issue Type: Bug
          Components: generator, injector
    Affects Versions: 1.7
         Environment: Nutch running in local mode with 4M+ domains in domain-urlfilter.txt
            Reporter: Greg Padiasek

Initially I found that Generator was throwing OutOfMemoryError exception no matter how much
RAM I allocated to JVM. I fixed the problem by moving URLFilters, URLNormalizers and ScoringFilters
to top-level class as singletons and re-using them in all Generator mapper instances.

Then I found that same problem in Injector and applied analogical fix.

Now it seems that this problem may be common in all Nutch Mapper implementations.

I was wondering if it would it be possible to integrate this kind of change
in the upstream code base and potentially update all vulnerable Mapper classes.

This message was sent by Atlassian JIRA

View raw message