sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Achett <...@voltage.com>
Subject Sqoop2 - how to integrate custom pre/post-processing code (such as encryption) when doing import
Date Tue, 01 Oct 2013 22:02:18 GMT
Hi all,

I've used Sqoop 1 to integrate custom pre-processing code when performing a Sqoop import from
a relational database into HDFS.  Basically, I used the "codegen" command to create the object-relational
mapping class, and then modified that class source code to embed my custom pre-processing
code.  Through this approach, I was able to modify the readFields() method to process the
field values (in this case, by encrypting sensitive fields), when reading the fields from
the JDBC result set and before setting them in the object instance.  I then used this modified
ORM class file when performing the Sqoop import operation.  The end result was that certain
fields in my data were encrypted by my custom code before being written into HDFS.

For example, modified ORM class:

  public class Customer extends SqoopRecord  implements DBWritable, Writable {


    public void readFields(ResultSet __dbResults) throws SQLException {
      this.__cur_result_set = __dbResults;
      this.id = JdbcWritableBridge.readInteger(1, __dbResults);
      this.last_name = JdbcWritableBridge.readString(2, __dbResults);
      this.first_name = JdbcWritableBridge.readString(3, __dbResults);

      # encrypt cc (credit card) field, before setting value in object
      this.cc = encrypt(JdbcWritableBridge.readString(4, __dbResults));


This approach works fine in Sqoop 1.

But I don't see any way to integrate such custom pre-processing code in Sqoop 2.  There is
no "codegen" or equivalent option in Sqoop2. Is there a UDF or other custom connector approach
that can be used in Sqoop 2 to achieve this, to process fields during the Sqoop 2 import job?
 If so, can you point me at some examples or docs showing how that works in Sqoop 2?


--Joe Achett

View raw message