sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhargav Nallapu <bhargav.nall...@corp.247customer.com>
Subject Fwd: Sqoop export .lzo to mysql duplicates
Date Fri, 23 Nov 2012 05:07:01 GMT

Finding this strange issue.


Hive writes an output to an external table, with LZO  compression in place.
So, my hdfs folder has large_file.lzo

Using Sqoop, when I try to export this file to the mysql table, the num of
rows is doubled.

Then I do,
lzop -d large_file.lzo

This doesn't happen if I load the same file uncompressing it, "large_file"
Rows are as expected.

Where as both small_file and small_file.lzo are loaded with correct rows.

Sqoop : v 1.30
Num of mappers : 1

Observation : Any compressed file (gzipped or lzo) of size greater than 60
MB (might be 64 MB), while exported to DB puts the double the row count,
probably exact duplicates.
Can anyone please help?

View raw message