sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <sumanchaitanya.das...@bt.com>
Subject Unzipping an SQL Server DB column data while using SQOOP
Date Fri, 28 Aug 2015 10:40:51 GMT
Hello Everyone,

I have the below requirement to do

1)      We have a DB in SQL Server.

2)      EVENTS related transactional data is getting stored in EVENTS table of DB. A part
of event related information is being generated as an xml (one for each event) and is being
stored in one of the columns (of type VARBINARY) of EVENTS table.

Note: Interesting part here is that all xmls are being zipped with gzip compression codec,
and only then being stored in one of the columns of EVENTS table.

3)      Now that we have decided to SQOOP the data to HDFS from EVENTS table, to generate
some reports on it, we are totally clueless on how to proceed on this requirement.

I have tried SQOOPing the data as it is into HDFS as a default text format. All the columns
are in human readable format except that VARBINARY column (where the zipped xml is stored).

So, can someone help me in this?

Ø  How can I unzip that zipped xml data? Can I do it in a single SQOOP command?

Ø  I have not tried but will it work if I SQOOP to a sequence file?

Please help me...I am struggling to start on this requirement  from past couple of days.


View raw message