sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SQOOP-1968) Optimize schema operation in getMatchingData of NameMatcher
Date Mon, 05 Jan 2015 08:25:35 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jerry Chen updated SQOOP-1968:
------------------------------
    Description: 
Two performance issues found in the Matcher implementations.

1. In getMatchingData  of NameMatcher, the following code block of building a HashMap will
not change across different getMatchingData  calls. The HashMap can build only once in Constructor.
{color:red}
    HashMap<String,Column> colNames = new HashMap<String, Column>();

    for (Column fromCol: getFromSchema().getColumnsArray()) {
      colNames.put(fromCol.getName(), fromCol);
    }
{color}
2. In getMatchingData  of NameMatcher, indexOf of a List implementation is not efficient.
It usually involves a loop for finding the object and return the index. To improve, we can
simple store the index in the above HashMap and retrieve the index by HashMap lookup directly

{color:red}
int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);
{color}
These performance problems are critical because getMatchingData is repeatedly calling for
each record.

  was:
Two performance issues found in the Matcher implementations.

1. In getMatchingData  of NameMatcher, the following code block of building a HashMap will
not change across different getMatchingData  calls. The HashMap can build only once in Constructor.

    HashMap<String,Column> colNames = new HashMap<String, Column>();

    for (Column fromCol: getFromSchema().getColumnsArray()) {
      colNames.put(fromCol.getName(), fromCol);
    }

2. In getMatchingData  of NameMatcher, indexOf of a List implementation is not efficient.
It usually involves a loop for finding the object and return the index. To improve, we can
simple store the index in the above HashMap and retrieve the index by HashMap lookup directly

int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);

These performance problems are critical because getMatchingData is repeatedly calling for
each record.


> Optimize schema operation in getMatchingData of NameMatcher
> -----------------------------------------------------------
>
>                 Key: SQOOP-1968
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1968
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: connectors/generic
>    Affects Versions: 2.0.0
>            Reporter: Jerry Chen
>
> Two performance issues found in the Matcher implementations.
> 1. In getMatchingData  of NameMatcher, the following code block of building a HashMap
will not change across different getMatchingData  calls. The HashMap can build only once in
Constructor.
> {color:red}
>     HashMap<String,Column> colNames = new HashMap<String, Column>();
>     for (Column fromCol: getFromSchema().getColumnsArray()) {
>       colNames.put(fromCol.getName(), fromCol);
>     }
> {color}
> 2. In getMatchingData  of NameMatcher, indexOf of a List implementation is not efficient.
It usually involves a loop for finding the object and return the index. To improve, we can
simple store the index in the above HashMap and retrieve the index by HashMap lookup directly
> {color:red}
> int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);
> {color}
> These performance problems are critical because getMatchingData is repeatedly calling
for each record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message