hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-7156) getpwuid_r is not thread-safe on RHEL6
Date Mon, 07 Mar 2011 05:55:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003271#comment-13003271

Todd Lipcon commented on HADOOP-7156:

bq. But for now, I'd rather just advertise "RHEL 6.0 is broken; don't use it" just like we
do for JREs.

Unfortunately many of us are not in a position to do this - RHEL 6.0 is a must for us, regardless
of some bugs it might have. Same with support for Vintela Authentication Services (VAS) which
has a similar bug. Asking users to switch their entire OS or auth system is not an option.

I think there are three workable options here from my perspective:

1) Always lock around getpwuid_r. Devaraj is added a cache for this function as part of another
JIRA, so it shouldn't be a big performance issue.
2) Add a compile-time macro like LOCK_AROUND_PWUID, which, when set, will add the monitor
lock around these calls.
3) Add a runtime Hadoop configuration option like hadoop.workaround.broken.getpwuid, which
when enabled adds the lock.

Which, if any, of these seem acceptable to you?

Since we've found that this isn't RHEL6 specific, but in fact occurs with other broken pieces
of software as well, I'm leaning towards option #1 or #3.

> getpwuid_r is not thread-safe on RHEL6
> --------------------------------------
>                 Key: HADOOP-7156
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7156
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>         Attachments: hadoop-7156.txt
> Due to the following bug in SSSD, functions like getpwuid_r are not thread-safe in RHEL
6.0 if sssd is specified in /etc/nsswitch.conf (as it is by default):
> https://fedorahosted.org/sssd/ticket/640
> This causes many fetch failures in the case that the native libraries are available,
since the SecureIO functions call getpwuid_r as part of fstat. By enabling -Xcheck:jni I get
the following trace on JVM crash:
> *** glibc detected *** /mnt/toolchain/JDK6u20-64bit/bin/java: free(): invalid pointer:
0x0000003575741d23 ***
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3575675676]
> /lib64/libnss_sss.so.2(_nss_sss_getpwuid_r+0x11b)[0x7fe716cb42cb]
> /lib64/libc.so.6(getpwuid_r+0xdd)[0x35756a5dfd]

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message