sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry Chen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-1968) Optimize schema operation in getMatchingData of NameMatcher
Date Mon, 05 Jan 2015 08:24:34 GMT
Jerry Chen created SQOOP-1968:

             Summary: Optimize schema operation in getMatchingData of NameMatcher
                 Key: SQOOP-1968
                 URL: https://issues.apache.org/jira/browse/SQOOP-1968
             Project: Sqoop
          Issue Type: Improvement
          Components: connectors/generic
    Affects Versions: 2.0.0
            Reporter: Jerry Chen

Two performance issues found in the Matcher implementations.

1. In getMatchingData  of NameMatcher, the following code block of building a HashMap will
not change across different getMatchingData  calls. The HashMap can build only once in Constructor.

    HashMap<String,Column> colNames = new HashMap<String, Column>();

    for (Column fromCol: getFromSchema().getColumnsArray()) {
      colNames.put(fromCol.getName(), fromCol);

2. In getMatchingData  of NameMatcher, indexOf of a List implementation is not efficient.
It usually involves a loop for finding the object and return the index. To improve, we can
simple store the index in the above HashMap and retrieve the index by HashMap lookup directly

int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);

These performance problems are critical because getMatchingData is repeatedly calling for
each record.

This message was sent by Atlassian JIRA

View raw message