hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10146) Workaround JDK7 Process fd close bug
Date Wed, 18 Dec 2013 21:17:09 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852169#comment-13852169
] 

Daryn Sharp commented on HADOOP-10146:
--------------------------------------

Yes, we were losing ~10s of NMs/day because of OOMs caused by this bug.  After the patch,
no OOMs.

The referenced openjdk bug is indeed the same problem.

Do I have a +1 to commit?

> Workaround JDK7 Process fd close bug
> ------------------------------------
>
>                 Key: HADOOP-10146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10146
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HADOOP-10129.branch-23.patch, HADOOP-10129.patch
>
>
> JDK7's {{Process}} output streams have an async fd-close race bug.  This manifests as
commands run via o.a.h.u.Shell causing threads to hang, OOM, or cause other bizarre behavior.
 The NM is likely to encounter the bug under heavy load.
> Specifically, {{ProcessBuilder}}'s {{UNIXProcess}} starts a thread to reap the process
and drain stdout/stderr to avoid a lingering zombie process.  A race occurs if the thread
using the stream closes it, the underlying fd is recycled/reopened, while the reaper is draining
it.  {{ProcessPipeInputStream.drainInputStream}}'s will OOM allocating an array if {{in.available()}}
returns a huge number, or may wreak havoc by incorrectly draining the fd.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message