sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pratik khadloya <tispra...@gmail.com>
Subject Re: Comparing sqoop's output to hdfs to data in mysql
Date Wed, 17 Sep 2014 19:26:32 GMT
Thanks Abraham. I was hoping to not have to do that since am dealing with
70+ queries.
In the mean time i am writing a script which uses sed to massage the files
and then compares.

Regards,
~Pratik

On Wed, Sep 17, 2014 at 12:18 PM, Abraham Elmahrek <abe@cloudera.com> wrote:

> Try making your query a free-form query and casting the float to a string.
> Something like the following worked well for me:
>
> SELECT id, text, fl, CAST(fl AS CHAR(64)) FROM fl WHERE $CONDITIONS
>
> On Tue, Sep 16, 2014 at 2:41 PM, pratik khadloya <tispratik@gmail.com>
> wrote:
>
>> Sure. The column in question is
>>
>> Field Type Null Default
>> usage_fee_percent double YES 0
>>
>> If we run a mysql select query the value is 0
>> If we cat the file on hdfs which is exported by sqoop the value is 0.0
>>
>> ======= Mysql Command =======
>> mysql -u <user> -p<pwd> -h <host> <db> --raw -e "SELECT
>> a.usage_fee_percent FROM accounts a" >
>> /home/pkhadloya/sqoop_out/mysql_accounts
>>
>> ======= Sqoop Command =======
>> bin/sqoop import -jt <jobtracker> --connect jdbc:mysql://.../<db>
>> --username <user> --password <pwd> --target-dir
>> /user/pkhadloya/sqoop/accounts --delete-target-dir --query "SELECT
>> a.usage_fee_percent FROM accounts a WHERE \$CONDITIONS" --num-mappers 1
>> --mapreduce-job-name accounts_sqoop_import --fields-terminated-by "\t"
>>  --as-textfile
>>
>> ======= Diff Command =======
>> bash -c "diff -U 0 <(tail -n +2 /home/pkhadloya/sqoop_out/mysql_accounts)
>> <(hadoop fs -cat /user/pkhadloya/sqoop/accounts/part-m-00*) >
>> /home/pkhadloya/sqoop_out/diff_accounts"
>>
>>
>> Thanks for looking into this.
>>
>> Regards,
>> Pratik
>>
>>
>> On Tue, Sep 16, 2014 at 2:11 PM, Abraham Elmahrek <abe@cloudera.com>
>> wrote:
>>
>>> Hey there,
>>>
>>> Could you provide us with the table description (types) and the sqoop
>>> command you are running?
>>>
>>> -Abe
>>>
>>>
>>> On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <tispratik@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am comparing the mysql data (by dumping into a file) to the textfile
>>>> imported by sqoop onto HDFS.
>>>> Am using the diff tool to do the same.
>>>>
>>>> I observed the following differences:
>>>> mysql      -->       sqoop_text_output
>>>> \\n                       \n
>>>> \\t                        \n
>>>> \$                        $
>>>> 0                         0.0
>>>>
>>>> So, it seems like mysql auto escapes the output with a \. I got around
>>>> that by telling mysql not to do that so that i can compare properly. I had
>>>> to pass the --raw flag to mysql. Then the only difference i currently see
>>>> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>>>>
>>>> How can i make mysql also convert the 0 to a 0.0 when it dumps to a
>>>> csv? Maybe the answer lines in the guts of sqoop that i can use myself. Or
>>>> is it possible to tell sqoop not to convert 0 to 0.0 ?
>>>>
>>>> All in all, i am trying to verify the work done by sqoop for my
>>>> satisfaction. Once i verify the text data is being exported fine, i will
>>>> verify the same for the parquet format.
>>>>
>>>> Thanks,
>>>> ~Pratik
>>>>
>>>
>>>
>>
>

Mime
View raw message