sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vineet Mishra <clearmido...@gmail.com>
Subject Re: Handling Special Character while Sqoop Import
Date Mon, 24 Nov 2014 10:50:45 GMT
Hi Abe,

Thanks for your mail, well mysql table is defined with utf-8 and even the
data is visible like mentioned below,

*Data in mysql : *सुरेन्द्र कुमार पाण्डेय

but as I move the same through sqoop import of data gets corrupted, as
provided in the last thread of this mail.

Well I even tried to set the parameters
*useUnicode=true&characterEncoding=utf8* and *--direct --
--default-character-set=utf8* to sqoop import mysql connection string but
still there's no luck.

Additionally, the data is containing some control character like Ctrl-A
(x001) and Ctrl-M likewise, which is even violating the field delimeter set
to sqoop import precisely as Ctrl-A. Is there a way to keep a possible
delimeter which can handle/work with any special or control character

Looking out for quick response.


On Sun, Nov 23, 2014 at 12:40 AM, Abraham Elmahrek <abe@cloudera.com> wrote:

> This could be in 2 places: Loading to HDFS, or extracting from MySQL.
> Sqoop should load every thing as UTF-8 by default, which supports Hindi.
> What is your default character set in MySQL? Could you copy/paste your
> my.cnf? Also, what version of MySQL are you running?
> On Sat, Nov 22, 2014 at 12:28 AM, Vineet Mishra <clearmidoubt@gmail.com>
> wrote:
>> Hi Abe,
>> Well with the above statement I mean to say that the data which is
>> residing in mysql is different from what is been imported via sqoop.
>> So let me shoot out an example for the same,
>> *Data in mysql : *सुरेन्द्र कुमार पाण्डेय
>> *Data in HDFS(Sqoop import) : * M-`M-$M-8M-`M-%M-
>> So this is the kind of changes I am landing into which is completely
>> loosing the meaning of the data.
>> Any help would be appreciated.
>> Thanks again!
>> On Sat, Nov 22, 2014 at 2:15 AM, Abraham Elmahrek <abe@cloudera.com>
>> wrote:
>>> Hey there,
>>> Could you explain what you mean by "losing its meaning"? It's possible
>>> you may need to set the character set:
>>> http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html
>>> .
>>> -Abe
>>> On Fri, Nov 21, 2014 at 5:57 AM, Vineet Mishra <clearmidoubt@gmail.com>
>>> wrote:
>>>> Hi,
>>>> I am doing a Sqoop import from mysql as source, recently I figured out
>>>> that data imported through sqoop from mysql was having some special
>>>> characters and even control character which was loosing its meaning while
>>>> moved to sqoop data files.
>>>> Looking out for a solution as how to handle this case of special
>>>> character or if possible pruning the unwanted data out of my target dataset.
>>>> Looking out for resolution at the earliest!
>>>> Thanks!

View raw message