sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Kuehn (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-2861) Sqoop2: Scheduler Pool Support
Date Wed, 02 Mar 2016 02:12:18 GMT
Scott Kuehn created SQOOP-2861:

             Summary: Sqoop2: Scheduler Pool Support
                 Key: SQOOP-2861
                 URL: https://issues.apache.org/jira/browse/SQOOP-2861
             Project: Sqoop
          Issue Type: New Feature
          Components: sqoop2-framework
    Affects Versions: 2.0.0
            Reporter: Scott Kuehn

Provide a mechanism to limit cluster-wide sqoop access to a particular FROM resource. The
use case is to configure a yarn scheduler pool that will limit the vcores and ram available
for jobs accessing a sensitive resource. A subset of sqoop2 jobs could be configured to run
in this pool, whereas other sqoop2 jobs would fall back to the default pool configured for
the sqoop2 server.

The throttling extractor mechanics are useful for preventing a single job from saturating
the resource, but this mechanism cannot limit aggregate resource access across jobs. This
ticket aims to enable the use of scheduler pools for scenarios when multiple sqoop2 jobs would
access a resource.

Possible implementation strategies:
# Enable clients to pass through job-specific mapreduce configuration, such as key=value pairs
in the CLI. A sqoop2 client would specify the scheduler pool by passing a {{mapreduce.job.queuename}}
from the CLI 
# Expose scheduler semantics to the client. An execution engine can subsequently decide to
honor the scheduler request. For example, a pool property can be interpreted and then set
as the {{mapreduce.job.queuename}} value of the hadoop configuration from the mapreduce execution

This message was sent by Atlassian JIRA

View raw message