sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vineet Mishra <clearmido...@gmail.com>
Subject Re: Handling Special Character while Sqoop Import
Date Wed, 26 Nov 2014 05:45:26 GMT
Well it seems to be the issue with Mysql Client configuration present on
the datanodes where sqoop is invoking the m/r job.

I performed a test on my local machine dumping the same data to mysql and
did a sqoop import to the hdfs and I can clearly see the data boarded to
HDFS.

This clearly indicates that the issue was in mysql client configuration
which I need to rectify and set character-set type to utf-8(I thought the
default character-set would be set to utf-8).


But still the later part of the question remains same, how do I manage the
control character present in the data as I don't know what could be the
part of data(as I have encountered Control characters), setting delimiter
as Control character would not solve the meaning if the data contained that
character itself.

Looking out for the standard solution.

Thanks!

On Mon, Nov 24, 2014 at 4:20 PM, Vineet Mishra <clearmidoubt@gmail.com>
wrote:

> Hi Abe,
>
> Thanks for your mail, well mysql table is defined with utf-8 and even the
> data is visible like mentioned below,
>
> *Data in mysql : *सुरेन्द्र कुमार पाण्डेय
>
> but as I move the same through sqoop import of data gets corrupted, as
> provided in the last thread of this mail.
>
> Well I even tried to set the parameters
> *useUnicode=true&characterEncoding=utf8* and *--direct --
> --default-character-set=utf8* to sqoop import mysql connection string but
> still there's no luck.
>
> Additionally, the data is containing some control character like Ctrl-A
> (x001) and Ctrl-M likewise, which is even violating the field delimeter set
> to sqoop import precisely as Ctrl-A. Is there a way to keep a possible
> delimeter which can handle/work with any special or control character
> introduced.
>
> Looking out for quick response.
>
> Thanks!
>
>
> On Sun, Nov 23, 2014 at 12:40 AM, Abraham Elmahrek <abe@cloudera.com>
> wrote:
>
>> This could be in 2 places: Loading to HDFS, or extracting from MySQL.
>> Sqoop should load every thing as UTF-8 by default, which supports Hindi.
>>
>> What is your default character set in MySQL? Could you copy/paste your
>> my.cnf? Also, what version of MySQL are you running?
>>
>> On Sat, Nov 22, 2014 at 12:28 AM, Vineet Mishra <clearmidoubt@gmail.com>
>> wrote:
>>
>>> Hi Abe,
>>>
>>> Well with the above statement I mean to say that the data which is
>>> residing in mysql is different from what is been imported via sqoop.
>>>
>>> So let me shoot out an example for the same,
>>>
>>> *Data in mysql : *सुरेन्द्र कुमार पाण्डेय
>>> *Data in HDFS(Sqoop import) : * M-`M-$M-8M-`M-%M-
>>>
>>> So this is the kind of changes I am landing into which is completely
>>> loosing the meaning of the data.
>>>
>>> Any help would be appreciated.
>>>
>>> Thanks again!
>>>
>>> On Sat, Nov 22, 2014 at 2:15 AM, Abraham Elmahrek <abe@cloudera.com>
>>> wrote:
>>>
>>>> Hey there,
>>>>
>>>> Could you explain what you mean by "losing its meaning"? It's possible
>>>> you may need to set the character set:
>>>> http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html
>>>> .
>>>>
>>>> -Abe
>>>>
>>>> On Fri, Nov 21, 2014 at 5:57 AM, Vineet Mishra <clearmidoubt@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am doing a Sqoop import from mysql as source, recently I figured out
>>>>> that data imported through sqoop from mysql was having some special
>>>>> characters and even control character which was loosing its meaning while
>>>>> moved to sqoop data files.
>>>>>
>>>>> Looking out for a solution as how to handle this case of special
>>>>> character or if possible pruning the unwanted data out of my target dataset.
>>>>>
>>>>> Looking out for resolution at the earliest!
>>>>>
>>>>> Thanks!
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message