sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: Unzipping an SQL Server DB column data while using SQOOP
Date Fri, 28 Aug 2015 22:04:04 GMT
Hey man,

Currently Sqoop doesn't have any ability to unzip a single column. This is
a feature that seems fit for Sqoop2 actually. Until then, I'd try using
Hive or Impala and write a custom UDF to help with column extraction.

-Abe

On Fri, Aug 28, 2015 at 3:40 AM <sumanchaitanya.dasari@bt.com> wrote:

> *Hello Everyone,*
>
>
>
> I have the below requirement to do
>
>
>
> 1)      We have a DB in SQL Server.
>
> 2)      *EVENTS* related transactional data is getting stored in *EVENTS*
> table of DB. A part of event related information is being generated as an
> *xml* (one for each event) and is being stored in one of the columns (of
> type *VARBINARY*) of *EVENTS* table.
>
> *Note:* Interesting part here is that all xmls are being zipped with *gzip
> *compression codec, and only then being stored in one of the columns of *EVENTS
> *table.
>
> 3)      Now that we have decided to *SQOOP* the data to *HDFS* from
> *EVENTS* table, to generate some reports on it, we are totally clueless
> on how to proceed on this requirement.
>
>
>
> I have tried SQOOPing the data as it is into HDFS as a default text
> format. All the columns are in human readable format except that VARBINARY
> column (where the zipped xml is stored).
>
>
>
> So, can someone help me in this?
>
> Ø  How can I unzip that zipped xml data? Can I do it in a single SQOOP
> command?
>
> Ø  I have not tried but will it work if I SQOOP to a *sequence* file?
>
>
>
> *Please help me…I am struggling to start on this requirement  from past
> couple of days.*
>
>
>
>
>
> *Thanks*
>
> *Suman*
>
>
>

Mime
View raw message