nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (NUTCH-314) Multiple language identifier instances
Date Sat, 12 Jan 2013 19:40:12 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lewis John McGibbney resolved NUTCH-314.
----------------------------------------

    Resolution: Won't Fix

close of legacy issue
                
> Multiple language identifier instances
> --------------------------------------
>
>                 Key: NUTCH-314
>                 URL: https://issues.apache.org/jira/browse/NUTCH-314
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8
>         Environment: OS: Linux RHEL 4
> JDK: 1.5_07
>            Reporter: Enrico Triolo
>
> In my application I often need to perform the inject -> generate -> .. -> index
loop multiple times, since users can 'suggest' new web pages to be crawled and indexed.
> I also need to enable the language identifier plugin.
> Everything seems to work correctly, but after some time I get an OutOfMemoryException.
Actually the time isn't important, since I noticed that the problem arises when the user submits
many urls (~100). As I said, for each submitted url a new loop is performed (similar to the
one in the Crawl.main method).
> Using a profiler (specifically, netbeans profiler) I found out that for each submitted
url a new LanguageIdentifier instance is created, and never released. With the memory inspector
tool I can see as many instances of LanguageIdentifier and NGramProfile$NGramEntry as the
number of fetched pages, each of them occupying about 180kb. Forcing garbage collection doesn't
release much memory.
> Maybe we should cache its instance in the conf as we do for many others objects in Nutch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message