hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9082) Select and document a platform-independent scripting language for use in Hadoop environment
Date Fri, 30 Nov 2012 07:26:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507163#comment-13507163

Allen Wittenauer commented on HADOOP-9082:

(I know this is mostly going to get ignored because a) it's from me, b) it's more than 3 lines,
and c) we've already proven that we only care about Linux despite people wanting support for
other platforms, but here we go anyway.)

While I can understand the build-time issues, I'm not sure I understand the run-time issues.
 If you are running on a system that doesn't have libhadoop or want to launch a task, you're
going to hit a fork() and that's going to call bash (or potentially sh).  Or are we planning
on replacing taskjvm.sh as well? So the bash requirement doesn't go away.

At run-time, the whole purpose of these scripts is to launch Java.  That's it.  The problem
that we have is that our current scripts are extremely convoluted, wrap into themselves, and
fundamentally aren't written very well.  Arguing that we can make our launcher scripts object
oriented or using an IDE to debug them seems like we're expecting to raise the complexity
to even more ludicrous levels.

One thing I'm very curious about is if we'll lose the ${BASH_SOURCE} functionality, something
I considering absolutely critical, by moving to Python.  (It allows one to run without setting
*any* environment variables. I think I submitted that as a patch years ago, but well...)

Let's say we pick Python.  Which version are we going to target? From a support perspective,
we could very easily end up asking about not only the Java version but the Python version.
 Do we really want that?

bq. The alternative would be to maintain two complete suites of scripts, one for Linux and
one for Windows (and perhaps others in the future).

This is what most projects do that have Windows and UNIX functionality, from what I've seen.
 This is because things are in different locations, delimiters, etc, etc  and if you merge
them, you end up with a lot of "if this then that, or if this2, then that2" to the point that
you essentially have two different suites of scripts but just stored in one anyway.

bq. We want to avoid the need to update dual modules in two different languages when functionality
changes, especially given that many Linux developers are not familiar with powershell or bat,
and many Windows developers are not familiar with shell or bash.

I think this is the real message: the "Linux developers.. which should be read as "Java developers
who work on Hadoop" don't know bash and fundamentally ignore most attempts from outside to
improve them.  Switching to something else isn't going to change this problem. Instead, it'll
just allow for them to continue ignoring the community in favor of their own changes.

Perhaps the fundamental problem is this:  Why are so many launcher changes even necessary?
 Why isn't Hadoop smart enough to figure out some of these things after Java is launched?
 Have we even seriously attempted a simplification of the scripts?  (I suspect just using
functions instead of the craziness around exported variables would make a world of difference.)
 Has there been any thought about actually creating real configuration files built by installers
so we don't have to recompute a half-dozen things at every run time?

Side-note: it would be interesting to see the memory footprint requirement differences on
something like one of Yahoo!'s gateways.  Sure, individually it isn't much.  But at scale...

Anyway, I've given my $0.02.  Do what you want, I won't stop you. But I do question the thinking
behind it.
> Select and document a platform-independent scripting language for use in Hadoop environment
> -------------------------------------------------------------------------------------------
>                 Key: HADOOP-9082
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9082
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Matt Foley
> This issue is going to be discussed at length in the common-dev@ mailing list, under
topic "[PROPOSAL] introduce Python as build-time and run-time dependency for Hadoop and throughout
Hadoop stack".

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message