sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: Convert new line chars from oracle to hive using sqoop
Date Mon, 22 Sep 2014 21:16:36 GMT
Hey there,

Could you please export a few of these lines to a file and run a 'hexdump'
on the file if possible? It would be interesting to see what exactly those
characters are.

-Abe

On Mon, Sep 22, 2014 at 11:27 AM, Vikash Talanki -X (vtalanki - INFOSYS
LIMITED at Cisco) <vtalanki@cisco.com> wrote:

>  Hi All,
>
>
>
> We are using *‘<EOL>*’ string( *--hive-delims-replacement ‘<EOL>’*)
to
> convert new lines chars in oracle fields while importing data into hive
> using sqoop.
>
> According to sqoop documentation -
> http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_large_objects –
> above parameter should only replace either *\n, \r or \01(^A)* characters
> with ‘<EOL>’.
>
> But we seeing that some special characters are also getting replaced to
> ‘<EOL>’
>
>
>
> Our scenario:
>
> *Oracle Field*
>
> *Hive Field*
>
> *Notepad ++*
>
> *Word*
>
> MEIKI COMPANY,LTD
>
> MEIKI<EOL> COMPANY,LTD
>
> [image: Screen capture]
>
> MEIKI__COMPANY,LTD
>
> AVENTIS@PHARMA
>
> AVENTIS<EOL>@PHARMA
>
> [image: Screen capture]
>
> AVENTIS_@PHARMA
>
>
>
> But, some character in above sample which is *NOT visible* in Oracle is
> being shown up as ‘*SOH*’ in notepad++ and as ‘*_*’ in word which is
> being converted into *<EOL>* by sqoop.
>
> Please help us understand this behavior.
>
> What does these chars mean to sqoop/hive?
>
> Is sqoop expected to replace these chars which doesn’t fall under either *\n,
> \r or \01(^A)* ?
>
> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Vikash Talanki*
> Engineer - Software
> vtalanki@cisco.com
> Phone: *+1 (408)838 4078 <%2B1%20%28408%29838%204078>*
>
> *Cisco Systems Limited*
> SJ-J 3
> 255 W Tasman Dr
> San Jose
> CA – 95134
> United States
> Cisco.com <http://www.cisco.com/>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
>
>
>
>

Mime
View raw message