hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "john lilley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13223) winutils.exe is a bug nexus and should be killed with an axe.
Date Sat, 16 Feb 2019 17:21:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16770156#comment-16770156

john lilley commented on HADOOP-13223:

One more comment. We hit this issue at a customer site, and it took a while to diagnose. 
Winutils.exe depends on msvcr110.dll (the Visual C++ 2012 redist). This was once so common
that we never had any issue – it always just happened to be installed on the system. 
But fast-forward a few years and VC++ 2012 may no longer the common redist it once was, so
we anticipate needing to install this as part of our solution. Also we've moved on from VC++
2012 a while ago too, so our app no longer includes it as a matter of course.  

I do not recommend moving this to a DLL, because as many commenters have pointed out, many
of the same issues exist there as well. Rather, use the Windows ACL support built into Java
NIO.  See
Not that this is simple, but neither is winutils C code.

> winutils.exe is a bug nexus and should be killed with an axe.
> -------------------------------------------------------------
>                 Key: HADOOP-13223
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13223
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: bin
>    Affects Versions: 2.6.0
>         Environment: Microsoft Windows, all versions
>            Reporter: john lilley
>            Priority: Major
> winutils.exe was apparently created as a stopgap measure to allow Hadoop to "work" on
Windows platforms, because the NativeIO libraries aren't implemented there (edit: even NativeIO
probably doesn't cover the operations that winutils.exe is used for).  Rather than building
a DLL that makes native OS calls, the creators of winutils.exe must have decided that it would
be more expedient to create an EXE to carry out file system operations in a linux-like fashion.
 Unfortunately, like many stopgap measures in software, this one has persisted well beyond
its expected lifetime and usefulness.  My team creates software that runs on Windows and Linux,
and winutils.exe is probably responsible for 20% of all issues we encounter, both during development
and in the field.
> Problem #1 with winutils.exe is that it is simply missing from many popular distros and/or
the client-side software installation for said distros, when supplied, fails to install winutils.exe.
 Thus, as software developers, we are forced to pick one version and distribute and install
it with our software.
> Which leads to problem #2: winutils.exe are not always compatible.  In particular, MapR
MUST have its winutils.exe in the system path, but doing so breaks the Hadoop distro for every
other Hadoop vendor.  This makes creating and maintaining test environments that work with
all of the Hadoop distros we want to test unnecessarily tedious and error-prone.
> Problem #3 is that the mechanism by which you inform the Hadoop client software where
to find winutils.exe is poorly documented and fragile.  First, it can be in the PATH.  If
it is in the PATH, that is where it is found.  However, the documentation, such as it is,
makes no mention of this, and instead says that you should set the HADOOP_HOME environment
variable, which does NOT override the winutils.exe found in your system PATH.
> Which leads to problem #4: There is no logging that says where winutils.exe was actually
found and loaded.  Because of this, fixing problems of finding the wrong winutils.exe are
extremely difficult.
> Problem #5 is that most of the time, such as when accessing straight up HDFS and YARN,
one does not *need* winutils.exe.  But if it is missing, the log messages complain about its
absence.  When we are trying to diagnose an obscure issue in Hadoop (of which there are many),
the presence of this red herring leads to all sorts of time wasted until someone on the team
points out that winutils.exe is not the problem, at least not this time.
> Problem #6 is that errors and stack traces from issues involving winutils.exe are not
helpful.  The Java stack trace ends at the ProcessBuilder call.  Only through bitter experience
is one able to connect the dots from "ProcessBuilder is the last thing on the stack" to "something
is wrong with winutils.exe".
> Note that none of these involve running Hadoop on Windows.  They are only encountered
when using Hadoop client libraries to access a cluster from Windows.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message