nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian H. (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1640) OOM in ParseSegment Phase
Date Fri, 01 Nov 2013 10:54:19 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811170#comment-13811170
] 

Ian H. commented on NUTCH-1640:
-------------------------------

Thanks for the patch! Is it going to be merged back to the 1.7 branch? The 1.7 release is
considered stable, so would be great to fix this bug there as well (currently, one has to
patch it manually from using the changes in trunk).

> OOM in ParseSegment Phase
> -------------------------
>
>                 Key: NUTCH-1640
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1640
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.7
>         Environment: RHEL 6.2 x86_64
>            Reporter: Mitesh Singh Jat
>         Attachments: NUTCH-1640.patch
>
>
> The nutch ParseSegment phase fails after 2 runs on same TaskTracker, with the following
Exception:
> {noformat}
> Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.lang.OutOfMemoryError: unable to create new native thread
> 	at java.lang.Thread.start0(Native Method)
> 	at java.lang.Thread.start(Thread.java:640)
> 	at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:553)
> 	at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvmRunner(JvmManager.java:317)
> 	at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.killJvm(JvmManager.java:297)
> 	at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.taskKilled(JvmManager.java:289)
> 	at org.apache.hadoop.mapred.JvmManager.taskKilled(JvmManager.java:158)
> 	at org.apache.hadoop.mapred.TaskRunner.kill(TaskRunner.java:802)
> 	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3315)
> 	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.jobHasFinished(TaskTracker.java:3287)
> 	at org.apache.hadoop.mapred.TaskTracker.purgeTask(TaskTracker.java:2316)
> 	at org.apache.hadoop.mapred.TaskTracker.fatalError(TaskTracker.java:3710)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1118)
> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
> 	at $Proxy1.fatalError(Unknown Source)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:310)
> {noformat}
> Whereas similar parsing when done in Nutch Fetcher Phase (fetcher.parse=true, fetcher.store.content=false)
does not give such issue.
> Hence, on analysing the code of Fetcher and ParseSegment, it seems the issue
> should be related to creation parseResult foreach url in ParseSegment.java.
> {code}
>  95     ParseResult parseResult = null;
>  96     try {
>  97       parseResult = new ParseUtil(getConf()).parse(content); // <*****
>  98     } catch (Exception e) {
>  99       LOG.warn("Error parsing: " + key + ": " + StringUtils.stringifyException(e));
> 100       return;
> 101     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message