hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang Xie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11604) Reach xceiver limit once the watcherThread die
Date Thu, 19 Feb 2015 05:13:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327005#comment-14327005

Liang Xie commented on HADOOP-11604:

Thanks for all the valuable comments. After checking the out file, i saw the ConcurrentModificationException
be thrown at inside the finally block:
        for (Entry entry : entries.values()) {      <<<< HERE
          sendCallback("close", entries, fdSet, entry.getDomainSocket().fd);

the log is sth like:
Exception in thread "Thread-25" java.util.ConcurrentModificationException
        at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
        at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
        at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:484)
        at java.lang.Thread.run(Thread.java:662)

so the root cause in our case should be the non thread-safe pattern: foreach {treemap.remove}.

> Reach xceiver limit once the watcherThread die
> ----------------------------------------------
>                 Key: HADOOP-11604
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11604
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>            Priority: Critical
>         Attachments: HADOOP-11604-001.txt, HADOOP-11604-002.txt
> Our product cluster hit the Xceiver limit even w/ HADOOP-10404 & HADOOP-11333, i
found it was caused by DomainSocketWatcher.watcherThread gone. Attached is a possible fix,
please review, thanks

This message was sent by Atlassian JIRA

View raw message