sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Robson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-2465) Initializer and Destroyer should know how many executors will run
Date Fri, 07 Aug 2015 03:58:46 GMT
David Robson created SQOOP-2465:

             Summary: Initializer and Destroyer should know how many executors will run
                 Key: SQOOP-2465
                 URL: https://issues.apache.org/jira/browse/SQOOP-2465
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.99.6
            Reporter: David Robson

Looking at a job to load data into Oracle as an example - depending on the way the user wants
to load data, we may be loading data into temporary tables. For maximum performance we need
to create a separate temporary table for each loader - so when the initializer is running
we need to know how many loaders will run so we can create these temporary tables. Again when
the destroyer is run we will need to drop these temporary tables - so it will need to know
as well.

Another example where we need to know this in the initializer - Oracle databases may be real
application clusters where there is multiple instances across multiple machines. For both
FROM and TO jobs we spread the load across these instances during the initialization phase
- so we need to know how many loaders / extractors will run.

In the case of a FROM job we could do this in the partition phase - but there is no way to
achieve this for a TO job. It seems we could either add the information into the initialize
phase - or add a new partition phase on the TO side that is called after the partition phase
on the FROM side. It could take the details of the partitioned output and match it up to the
other side.

This message was sent by Atlassian JIRA

View raw message