sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <vtala...@cisco.com>
Subject RE: Convert new line chars from oracle to hive using sqoop
Date Mon, 22 Sep 2014 23:00:09 GMT
Thanks Abhraham for the reply.

Here is the hexadump output

Oracle

NotePad++

Hexadump

[Screen capture]

[Screen capture]

[Screen capture]



Thanks,
Vikash Talanki
+1 (408)838-4078

From: Abraham Elmahrek [mailto:abe@cloudera.com]
Sent: Monday, September 22, 2014 2:17 PM
To: user@sqoop.apache.org
Cc: Indresh Padmonkar (ipadmonk); Anushree Shipra -X (ashipra); Phani Yadavilli -X (pyadavil)
Subject: Re: Convert new line chars from oracle to hive using sqoop

Hey there,

Could you please export a few of these lines to a file and run a 'hexdump' on the file if
possible? It would be interesting to see what exactly those characters are.

-Abe

On Mon, Sep 22, 2014 at 11:27 AM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)
<vtalanki@cisco.com<mailto:vtalanki@cisco.com>> wrote:
Hi All,

We are using ‘<EOL>’ string( --hive-delims-replacement ‘<EOL>’) to convert
new lines chars in oracle fields while importing data into hive using sqoop.
According to sqoop documentation - http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_large_objects
– above parameter should only replace either \n, \r or \01(^A) characters with ‘<EOL>’.
But we seeing that some special characters are also getting replaced to ‘<EOL>’

Our scenario:
Oracle Field

Hive Field

Notepad ++

Word

MEIKI COMPANY,LTD

MEIKI<EOL> COMPANY,LTD

[Screen capture]

MEIKI__COMPANY,LTD

AVENTIS@PHARMA

AVENTIS<EOL>@PHARMA

[Screen capture]

AVENTIS_@PHARMA


But, some character in above sample which is NOT visible in Oracle is being shown up as ‘SOH’
in notepad++ and as ‘_’ in word which is being converted into <EOL> by sqoop.
Please help us understand this behavior.
What does these chars mean to sqoop/hive?
Is sqoop expected to replace these chars which doesn’t fall under either \n, \r or \01(^A)
?
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
vtalanki@cisco.com<mailto:vtalanki@cisco.com>
Phone: +1 (408)838 4078<tel:%2B1%20%28408%29838%204078>

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA – 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended
recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If
you are not the intended recipient (or authorized to receive for the recipient), please contact
the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html




Mime
View raw message