sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkat Ranganathan" <n....@live.com>
Subject Re: Review Request 22516: Support importing mainframe sequential datasets
Date Wed, 23 Jul 2014 15:18:18 GMT


> On July 10, 2014, 8:22 a.m., Venkat Ranganathan wrote:
> > src/java/org/apache/sqoop/manager/MainframeManager.java, line 75
> > <https://reviews.apache.org/r/22516/diff/1/?file=608148#file608148line75>
> >
> >     Is import into Hbase and Accumulo supported by this tool?  It looks like the
only target supported is HDFS text files from the command help.
> 
> Mariappan Asokan wrote:
>     Each record in a mainframe dataset is treated as a single field (or column.)  So,
theoretically HBase, Accumulo, and Hive are supported but with limited usability.  So, I did
not add them in the documentation.  If you feel strongly that they should be documented, I
can work on that in the next version of the patch.
> 
> Venkat Ranganathan wrote:
>     I feel it would be good to say we import only as text files and leave further processing,
loading into hive/hbase upto the user as the composition of the records and needed processing
differ and the schema can't be inferred.
> 
> Mariappan Asokan wrote:
>     I agree with you.  To avoid confusion, I plan to remove support for parsing input
format, output format, hive, hbase, hcatalog, and codegen options.  This will synchronize
the document with the code. What do you think?
>
> 
> Venkat Ranganathan wrote:
>     Sorry for the delay.   I was wondering whether the mainframe connector can just define
connector specific extra args and not create another tool.   Please see NetezzaManager or
DirectNetezzaManager as an example.   May be you have to invent a new synthetic  URI format
say jdbc:mfftp:<host address>:<port>/dataset and choose your Connection Manager
when --connect option with the above uri format is given.  That should simplify a whole lot
in my opinion.   What do you think?
> 
> Mariappan Asokan wrote:
>     Thanks for your suggestions.  Sorry, I did not get back sooner.  In Sqoop 1.x, there
is a strong assumption that input source is always a database table.  Due to this the sqoop
import tool has many options that are relevant to a source database table.  A mainframe source
is totally different from a database table.  I think it is better to create a separate tool
for mainframe import rather than just a new connection manager.  The mainframe import tool
will not support many options that the database import tool supports.  It will have its own
options that the database import tool does not support.  At present, these are the host name
and partitioned dataset name.  In the future, the mainframe import tool may be enhanced with
metadata specific or connection specific arguments unique to mainframe.  Creating a synthetic
URI for a connection seems to be somewhat artificial to me.
>     
>     Contrary to what I stated before, considering possible future enhancements, I think
it is better to retain the support for parsing input format, output format, Hive, HBase, HCatalog,
and codegen options.  The documentation will be enhanced in the future to reflect this support.
>

Thanks for your thoughts on the suggestion.  As you correctly pointed out, Sqoop 1.x has a
JDBC model (that is why you had to implement  a ConnectionManager and provide pseudo values
for column types etc (always returning VARCHAR).   I understand there will be options mainframe
import will not support (much like there are mysql specific options or netezza or sqlserver
specific options).   I understand you want to have specific metadata for mainframe import.
 That may be tricky.   Connection specific arguments can be implemented as how JDBC connection
specific arguments are done.  

The reason for my suggestion was primarily to piggy back on the implementation for imports
into hive/hbase in future when you have the ability to provide specific metadata on the data.
You can definitely parse the various options, but you have to explicitly check and exit if
the unsupported options are currently used.

My only worry with this tool is that this may be one off for mainframe imports alone and we
will be starting off with hdfs import only until you get to the rest of the parts and when
we finally see this, it is basically duplicating some of the code and may be difficult to
maintain,


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22516/#review47555
-----------------------------------------------------------


On June 14, 2014, 10:46 p.m., Mariappan Asokan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22516/
> -----------------------------------------------------------
> 
> (Updated June 14, 2014, 10:46 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> This is to move mainframe datasets to Hadoop.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/MainframeManager.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/MainframeDatasetFTPRecordReader.java PRE-CREATION

>   src/java/org/apache/sqoop/mapreduce/MainframeDatasetImportMapper.java PRE-CREATION

>   src/java/org/apache/sqoop/mapreduce/MainframeDatasetInputFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/MainframeDatasetInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/MainframeDatasetRecordReader.java PRE-CREATION

>   src/java/org/apache/sqoop/mapreduce/MainframeImportJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/MainframeImportTool.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/SqoopTool.java dbe429a 
>   src/java/org/apache/sqoop/util/MainframeFTPClientUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/manager/TestMainframeManager.java PRE-CREATION 
>   src/test/org/apache/sqoop/mapreduce/TestMainframeDatasetFTPRecordReader.java PRE-CREATION

>   src/test/org/apache/sqoop/mapreduce/TestMainframeDatasetInputFormat.java PRE-CREATION

>   src/test/org/apache/sqoop/mapreduce/TestMainframeDatasetInputSplit.java PRE-CREATION

>   src/test/org/apache/sqoop/mapreduce/TestMainframeImportJob.java PRE-CREATION 
>   src/test/org/apache/sqoop/tool/TestMainframeImportTool.java PRE-CREATION 
>   src/test/org/apache/sqoop/util/TestMainframeFTPClientUtils.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22516/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Mariappan Asokan
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message