Hi Abe,

Thanks for your mail, well mysql table is defined with utf-8 and even the data is visible like mentioned below,

Data in mysql : सुरेन्द्र कुमार पाण्डेय

but as I move the same through sqoop import of data gets corrupted, as provided in the last thread of this mail.

Well I even tried to set the parameters useUnicode=true&characterEncoding=utf8 and --direct -- --default-character-set=utf8 to sqoop import mysql connection string but still there's no luck.

Additionally, the data is containing some control character like Ctrl-A (x001) and Ctrl-M likewise, which is even violating the field delimeter set to sqoop import precisely as Ctrl-A. Is there a way to keep a possible delimeter which can handle/work with any special or control character introduced.

Looking out for quick response.


On Sun, Nov 23, 2014 at 12:40 AM, Abraham Elmahrek <abe@cloudera.com> wrote:
This could be in 2 places: Loading to HDFS, or extracting from MySQL. Sqoop should load every thing as UTF-8 by default, which supports Hindi.

What is your default character set in MySQL? Could you copy/paste your my.cnf? Also, what version of MySQL are you running?

On Sat, Nov 22, 2014 at 12:28 AM, Vineet Mishra <clearmidoubt@gmail.com> wrote:
Hi Abe,

Well with the above statement I mean to say that the data which is residing in mysql is different from what is been imported via sqoop.

So let me shoot out an example for the same,

Data in mysql : सुरेन्द्र कुमार पाण्डेय
Data in HDFS(Sqoop import) :  M-`M-$M-8M-`M-%M-

So this is the kind of changes I am landing into which is completely loosing the meaning of the data.

Any help would be appreciated.

Thanks again!

On Sat, Nov 22, 2014 at 2:15 AM, Abraham Elmahrek <abe@cloudera.com> wrote:
Hey there,

Could you explain what you mean by "losing its meaning"? It's possible you may need to set the character set: http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html.


On Fri, Nov 21, 2014 at 5:57 AM, Vineet Mishra <clearmidoubt@gmail.com> wrote:

I am doing a Sqoop import from mysql as source, recently I figured out that data imported through sqoop from mysql was having some special characters and even control character which was loosing its meaning while moved to sqoop data files. 

Looking out for a solution as how to handle this case of special character or if possible pruning the unwanted data out of my target dataset.

Looking out for resolution at the earliest!