sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis <sunseaandpa...@gmail.com>
Subject Re: Use custom models in scoop
Date Tue, 23 Sep 2014 08:31:37 GMT
     My models are simple java pojos with business fields/functions and 
different custom mappings embedded: fixed length line, csvs, binary and 
SQL(for export). I use them within map-reduce jobs that pick info from 
the HDFS. So having field and column names within these models as well 
as fields order for SQL serialization/deserialization I would really 
like to reuse that and be able to cover with tests. Another solution 
will be to implement scoop-like MR job, but I don't want to mess with 
reimplementing that and I still want to be able to test --direct option. 
So is there any way to plugin my models into the scoop? I can implement 
DBWritable interface and any other interface required.

On 2014-09-23 11:13, Abraham Elmahrek wrote:
> Hey Denis,
> Could you describe your models a bit? Do they have a special structure 
> and require the output format to be different? Would they exist in 
> HDFS? HBase? etc.
> What ever it may be, you could potentially hack it into Sqoop1 or you 
> could wait for Sqoop2 and write a connector. The code generated by 
> Sqoop is just a Writable that describes how to read fields and write 
> fields from/to your database. I don't think it's a good idea to modify 
> the generated code as it would work only for that single instance and 
> is kind of a mess to keep track of. Until I understand your models a 
> bit more... I think that's the best advice I can give though.
> -Abe
> On Mon, Sep 22, 2014 at 9:04 AM, Denis <sunseaandpalms@gmail.com 
> <mailto:sunseaandpalms@gmail.com>> wrote:
>     Hi,
>         I am looking for a good solution to integrate my model classes
>     with scoop. The only solution I see right now is to import with
>     /'scoop import/ /.../' command and then run a map job to convert
>     into my model. I don't like this approach because: 1 - I need to
>     duplicate fields sequence information while executing 'scoop
>     import ...', 2 - I don't see any easy way I can do a junit test to
>     check the imported data can be uploaded back to the DB without
>     errors (there is a custom upload procedure, not a scoop). So
>     ideally I would like to extend some interface, do some tricks and
>     plugin my model into the scoop (I still want to be able to
>     leverage --direct mode). Any help is highly appreciated. If my
>     ideal case will cause lot of pain to me, please share some
>     resources that describe how can I use 'sqoop codegen' results
>     later (again, ideally as a map-reduce job config).

View raw message