sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: Sqoop2 - how to integrate custom pre/post-processing code (such as encryption) when doing import
Date Thu, 03 Oct 2013 16:50:52 GMT
Hi Joe,
Sqoop 2 is not doing any code generation and as a result there is no "codegen" tool or a way
how to plug-in your own code. We are thinking about introducing a simple transform phase where
similar functionality would be possible, but such facilities are not yet available in Sqoop

I've noticed that you've also shared the question on stack overflow portal [1], so I've provided
the answer also there. Please do not cross post the same question on multiple forums as it's
creating diverging discussions and it's really hard for a common user searching for an answer
to get oriented in that.


1: http://stackoverflow.com/questions/19148039/sqoop2-how-to-integrate-custom-pre-post-processing-code-such-as-encryption-w

On Tue, Oct 01, 2013 at 03:02:18PM -0700, Joe Achett wrote:
> Hi all,
> I've used Sqoop 1 to integrate custom pre-processing code when performing a Sqoop import
from a relational database into HDFS.  Basically, I used the "codegen" command to create the
object-relational mapping class, and then modified that class source code to embed my custom
pre-processing code.  Through this approach, I was able to modify the readFields() method
to process the field values (in this case, by encrypting sensitive fields), when reading the
fields from the JDBC result set and before setting them in the object instance.  I then used
this modified ORM class file when performing the Sqoop import operation.  The end result was
that certain fields in my data were encrypted by my custom code before being written into
> For example, modified ORM class:
>   public class Customer extends SqoopRecord  implements DBWritable, Writable {
>   ...
>     public void readFields(ResultSet __dbResults) throws SQLException {
>       this.__cur_result_set = __dbResults;
>       this.id = JdbcWritableBridge.readInteger(1, __dbResults);
>       this.last_name = JdbcWritableBridge.readString(2, __dbResults);
>       this.first_name = JdbcWritableBridge.readString(3, __dbResults);
>       # encrypt cc (credit card) field, before setting value in object
>       this.cc = encrypt(JdbcWritableBridge.readString(4, __dbResults));
>     }
>   ...
>   }
> This approach works fine in Sqoop 1.
> But I don't see any way to integrate such custom pre-processing code in Sqoop 2.  There
is no "codegen" or equivalent option in Sqoop2. Is there a UDF or other custom connector approach
that can be used in Sqoop 2 to achieve this, to process fields during the Sqoop 2 import job?
 If so, can you point me at some examples or docs showing how that works in Sqoop 2?
> Thanks!
> --Joe Achett

View raw message