sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pratik khadloya <tispra...@gmail.com>
Subject Re: Comparing sqoop's output to hdfs to data in mysql
Date Tue, 16 Sep 2014 21:41:43 GMT
Sure. The column in question is

Field Type Null Default
usage_fee_percent double YES 0

If we run a mysql select query the value is 0
If we cat the file on hdfs which is exported by sqoop the value is 0.0

======= Mysql Command =======
mysql -u <user> -p<pwd> -h <host> <db> --raw -e "SELECT a.usage_fee_percent
FROM accounts a" > /home/pkhadloya/sqoop_out/mysql_accounts

======= Sqoop Command =======
bin/sqoop import -jt <jobtracker> --connect jdbc:mysql://.../<db>
--username <user> --password <pwd> --target-dir
/user/pkhadloya/sqoop/accounts --delete-target-dir --query "SELECT
a.usage_fee_percent FROM accounts a WHERE \$CONDITIONS" --num-mappers 1
--mapreduce-job-name accounts_sqoop_import --fields-terminated-by "\t"
 --as-textfile

======= Diff Command =======
bash -c "diff -U 0 <(tail -n +2 /home/pkhadloya/sqoop_out/mysql_accounts)
<(hadoop fs -cat /user/pkhadloya/sqoop/accounts/part-m-00*) >
/home/pkhadloya/sqoop_out/diff_accounts"


Thanks for looking into this.

Regards,
Pratik


On Tue, Sep 16, 2014 at 2:11 PM, Abraham Elmahrek <abe@cloudera.com> wrote:

> Hey there,
>
> Could you provide us with the table description (types) and the sqoop
> command you are running?
>
> -Abe
>
>
> On Tue, Sep 16, 2014 at 11:19 AM, pratik khadloya <tispratik@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am comparing the mysql data (by dumping into a file) to the textfile
>> imported by sqoop onto HDFS.
>> Am using the diff tool to do the same.
>>
>> I observed the following differences:
>> mysql      -->       sqoop_text_output
>> \\n                       \n
>> \\t                        \n
>> \$                        $
>> 0                         0.0
>>
>> So, it seems like mysql auto escapes the output with a \. I got around
>> that by telling mysql not to do that so that i can compare properly. I had
>> to pass the --raw flag to mysql. Then the only difference i currently see
>> is that 0 being converted to 0.0 by sqoop (as mentioned in the docs).
>>
>> How can i make mysql also convert the 0 to a 0.0 when it dumps to a csv?
>> Maybe the answer lines in the guts of sqoop that i can use myself. Or is it
>> possible to tell sqoop not to convert 0 to 0.0 ?
>>
>> All in all, i am trying to verify the work done by sqoop for my
>> satisfaction. Once i verify the text data is being exported fine, i will
>> verify the same for the parquet format.
>>
>> Thanks,
>> ~Pratik
>>
>
>

Mime
View raw message