sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-513) Provide a way to override the default splitter
Date Tue, 03 Jul 2012 22:32:34 GMT
Cheolsoo Park created SQOOP-513:

             Summary: Provide a way to override the default splitter
                 Key: SQOOP-513
                 URL: https://issues.apache.org/jira/browse/SQOOP-513
             Project: Sqoop
          Issue Type: Improvement
    Affects Versions: 1.4.1-incubating
            Reporter: Cheolsoo Park

when the number of mappers is greater than 1, Sqoop divides rows using simple queries such

select x, y from foo where x > 10 and x <= 20.

The ranges are computed simply by (max - min) / # of mappers. This works fine if values of
the split-by column are distributed evenly; however, it doesn't work well with skewed distribution,
for example.

The proposal is to provide a way so that the user can override the default splitter. For example,
the user should be able to write their own splitter class, pass the class name via a command
option, and use that splitter at runtime instead of the default splitter.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message