sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Jarcec Cecho (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1072) Sqoop2: Abstract Input/Output interfaces
Date Wed, 04 Sep 2013 15:35:51 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757877#comment-13757877
] 

Jarek Jarcec Cecho commented on SQOOP-1072:
-------------------------------------------

I've started investigating this one and I would like to share my thoughts with other developers
to get additional feedback.

I'm thinking about introducing a new second level citizen object called HIO (hadoop input
output). Such objects would be something similar to a small connector, they would have independent
configurations, validations and upgraders. Each HIO would cover one specific Input (export)
or output (import) on hadoop side. For example I would imagine HDFS, HCatalog, HBase or Hive
HIO implementations. I'm thinking of HIO implementations as a second level citizens, because
I would not expect users or developers to be creating a new HIO often. Yet I believe that
clear separation of each HIO implementation into separate maven module encapsulating the functionality
will help us to achieve better readable and maintainable code (e.g unlike Sqoop 1.x). Unlike
connectors I would expect that HIO will be more tightly integrated with Sqoop internals and
will become more internal abstraction than something entirely exposed to the end user.

Having said all the nice words, I do not have on my mind simple path how to achieve that.
Sqoop currently have only one framework entity encapsulating all configuration, validations
and upgrades. We could potentially load all HIO modules on server start up and merge them
into one structure that will be then used everywhere else. However I would assume that such
merge could be quite tricky - we would have to ensure that form names are unique and validations
with upgrades could easily become a nightmare. On the bride side, such merge would require
quite isolated changes, so the initial implementation would be most likely quite simple. Another
approach would be to make the HIO real second level citizen promoting the structures everywhere
- e.g. represent them separately in the repository, let user explicitly choose which HIO should
be used in a job (protocol + client change), etc... This second approach would be very intrusive
as almost every aspect of Sqoop would have to altered. On the other side I would expect that
we would end up with much cleaner design as all top level entities would be clearly separated.

I would be interested to hear thoughts of other contributors to see what path would be preferable.
I'll be more than happy to put together more formal proposal for the aggressive path if necessary.
                
> Sqoop2: Abstract Input/Output interfaces
> ----------------------------------------
>
>                 Key: SQOOP-1072
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1072
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.99.2
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 2.0.0
>
>
> The input/output interfaces like {{Text}} or {{SequenceFile}} are currently hardcoded
and are present through entire code base. It would be great to abstract the I/O module similarly
as we are doing in connectors and push appropriate code to separate modules.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message