sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Jarcec Cecho (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-2464) Initializer object is not reused when calling getSchema
Date Fri, 07 Aug 2015 22:02:45 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662525#comment-14662525

Jarek Jarcec Cecho commented on SQOOP-2464:

The original reason why we've decided to re-create all the workflow objects from scratch for
each of the callback is that we don't want to allow connector developers to start depending
on this as we might want to move the callbacks to different process in the future (even perhaps
running on different machine). We're already taking advantage of that in [{{Destructor}}|https://github.com/apache/sqoop/blob/sqoop2/connector/connector-sdk/src/main/java/org/apache/sqoop/job/etl/Destroyer.java]
class where each of the callbacks is actually called from different machines when using the
default mapreduce execution engine.

I however feel that initialization of the connector and getting the schema will always belong
to "initialization" phase that has to be done from single process. Hence I'm supportive of
changing the semantics as suggested. We should however add tests that will ensure that object-reuse
for those two methods is correctly done and document this behavior in our developer guide.

> Initializer object is not reused when calling getSchema
> -------------------------------------------------------
>                 Key: SQOOP-2464
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2464
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.99.6
>            Reporter: David Robson
> In JobManager there is two methods which are called one after the other - "initializeConnector"
and "getSchemaForConnector". Both these methods do the same thing as the first step - create
a new instance of the initializer class.
> If the same instance of the initializer was shared it means the class could keep resources
open (such as a connection to the database) and not have to re-establish the connection. This
might mean a close method needs to be added to the initializers as otherwise the getSchema
would need to close any resources opened in the initialize call - which might seem a bit confusing.

This message was sent by Atlassian JIRA

View raw message