hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Loddengaard (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6248) Circus: Proposal and Preliminary Code for a Hadoop System Testing Framework
Date Fri, 11 Sep 2009 22:39:57 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754391#action_12754391

Alex Loddengaard commented on HADOOP-6248:

Thanks for the feedback, Chris.

The idea is interesting and could beget a useful tool, but the current version is principally
a wrapper for default scripts and settings.
As the proposal states, this is a framework, with enough context examples and tests to show
how the framework is used.  I agree with you that it is currently a wrapper, but it will immediately
cease to be a wrapper when more interesting contexts and tests are written.  Being a large
contributor to Hadoop itself, I would love to hear how you think this tool could make your
job easier, if at all.  Some of us here at Cloudera, along with at least a few of our customers
and users, would value a framework like this.  Circus will let an organization write a context
that uses a development cluster of some sort, along with tests that emulate their production
jobs, to ensure that their jobs are running as expected on their development cluster.  Then,
by simply switching contexts, the organization can run all of their jobs on a different version
of Hadoop.  Perhaps I should write a new, more interesting context to prove my point.

More responses:
Don't cut and paste code such as examples.
Agreed it's silly to copy-paste the word count example.  This test is a demonstration that
users can compile Java MapReduce programs in their tests.  I find it useful in that regard,
but I can write a new MapReduce job that isn't an example to demonstrate the compilation use
case if you'd like.  I chose the word count example specifically so users interested in writing
tests would have access to a very simple MapReduce program that is compiled on the fly.

Don't wrap the shell scripts with another level of indirection; they do enough of that on
their own
I assume you're referring to the bin/hadoop-daemon.sh and bin/hadoop scripts, right?  I argue
that not using these scripts would greatly complicate creating new contexts and tests.  I
want users of Circus to write contexts and tests in a way that they're familiar with; namely,
command line tools.  Additionally, Circus is meant to test Hadoop end-to-end.  Using the shell
scripts helps to achieve this goal, especially because Hadoop's unit tests do not test the
shell scripts.  What are your specific objections to calling bin/hadoop-daemon.sh and bin/hadoop,
except that doing so is one more level of indirection?

We try not to include references to specific companies. Certainly Hadoop should not be fetched
from anywhere but Apache in this distribution.
Good catch here.  While scanning the Apache mirror page, I didn't notice a link to an apache.org
site.  My mistake.

> Circus: Proposal and Preliminary Code for a Hadoop System Testing Framework
> ---------------------------------------------------------------------------
>                 Key: HADOOP-6248
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6248
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: test
>         Environment: Python, bash
>            Reporter: Alex Loddengaard
>         Attachments: HADOOP-6248.diff, HADOOP-6248_v2.diff
> This issue contains a proposal and preliminary source code for Circus, a Hadoop system
testing framework.  At a high level, Circus will help Hadoop users and QA engineers to run
system tests on a configurable Hadoop cluster, or distribution of Hadoop.  See the comment
below for the proposal itself.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message