hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-8849) DynoYARN: A simulation and testing infrastructure for YARN clusters
Date Fri, 05 Oct 2018 04:01:03 GMT
Arun Suresh created YARN-8849:

             Summary: DynoYARN: A simulation and testing infrastructure for YARN clusters
                 Key: YARN-8849
                 URL: https://issues.apache.org/jira/browse/YARN-8849
             Project: Hadoop YARN
          Issue Type: New Feature
            Reporter: Arun Suresh

Traditionally, YARN workload simulation is performed using SLS (Scheduler Load Simulator)
which is packaged with YARN. It Essentially, starts a full fledged *ResourceManager*, but
runs simulators for the *NodeManager* and the *ApplicationMaster* Containers. These simulators
are lightweight and run in a threadpool. The NM simulators do not open any external ports
and send (in-process) heartbeats to the ResourceManager.

There are a couple of drawbacks with using the SLS:
* It might be difficult to simulate really large clusters without having access to a very
beefy box - since the NMs are launched as tasks in a threadpool, and each NM has to send periodic
heartbeats to the RM.
* Certain features (like YARN-1011) requires changes to the NodeManager - aspects such as
queuing and selectively killing containers have to be incorporate into the existing NM Simulator
which might make the simulator a bit heavy weight - there is a need for locking and synchronization.
* Since the NM and AM are simulations, only the Scheduler is faithfully tested - it does not
really perform an end-2-end test of a cluster.

Therefore, drawing inspiration from [Dynamometer|https://github.com/linkedin/dynamometer],
we propose a framework for YARN deployable YARN cluster - *DynoYARN* - for testing, with the
following features:
* The NM already has hooks to plug-in custom *ContainerExecutor* and *NodeResourceMonitor*.
If we can plug-in a custom *ContainersMonitorImpl*'s Monitoring thread (and other modules
like the LocalizationService), We can probably inject an Executor that does not actually launch
containers and a Node and Container resource monitor that reports synthetic pre-specified
Utilization metrics back to the RM.
* Since we are launching fake containers, we cannot run normal AM containers. We can therefore,
use *Unmanaged AM*'s to launch synthetic jobs.

Essentially, a test workflow would look like this:
* Launch a DynoYARN cluster.
* Use the Unmanaged AM feature to directly negotiate with the DynaYARN Resource Manager for
container tokens.
* Use the container tokens from the RM to directly ask the DynoYARN Node Managers to start
fake containers.
* The DynoYARN NodeManagers will start the fake containers and report to the DynoYARN Resource
Manager synthetically generated resource utilization for the containers (which will be injected
via the *ContainerLaunchContext* and parsed by the plugged-in Container Executor).
* The Scheduler will use the utilization report to schedule containers - we will be able to
test allocation of {{Opportunistic}} containers based on resource utilization.
* Since the DynoYARN Node Managers run the actual code paths, all preemption and queuing logic
will be faithfully executed.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message