sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: Getting bogus rows from sqoop import...?
Date Thu, 21 Mar 2013 04:42:05 GMT
Hi Felix,
we've seen similar behaviour in the past when the data itself contains Hive special characters
like new line characters. Would you mind trying your import with --hive-drop-import-delims
to see if it helps?

Jarcec

On Wed, Mar 20, 2013 at 11:27:58PM -0400, Felix GV wrote:
> Hello,
> 
> I'm trying to import a full table from MySQL to Hadoop/Hive. It works with
> certain parameters, but when I try to do an ETL that's somewhat more
> complex, I start getting bogus rows in my resulting table.
> 
> This works:
> 
> sqoop import \
>         --connect
> 'jdbc:mysql://backup.general.db/general?tinyInt1isBit=false&zeroDateTimeBehavior=convertToNull'
> \
>         --username xxxxx \
>         --password xxxxx \
>         --hive-import \
>         --hive-overwrite \
>         -m 23 \
>         --direct \
>         --hive-table profile_felix_test17 \
>         --split-by id \
>         --table Profile
> 
> But if I use a --query instead of a --table, then I start getting bogus
> records (and by that, I mean rows that have a non-sensically high primary
> key that doesn't exist in my source database and null for the rest of the
> cells).
> 
> The output I get with the above query is not exactly the way I want it.
> Using --query, I can get the data in the format I want (by transforming
> some stuff inside MySQL), but then I also get the bogus rows, which pretty
> much makes the Hive table unusable.
> 
> I tried various combinations of parameters and it's hard to pin-point
> exactly what causes the problem, so it could be more intricate than my
> above simplistic description. That being said, removing --table and adding
> the following params definitely breaks it:
> 
>         --target-dir /tests/sqoop/general/profile_felix_test \
>         --query "select * from Profile WHERE \$CONDITIONS"
> 
> (Ultimately, I want to use a query that's more complex than this, but even
> a simple query like this breaks...)
> 
> Any ideas why this would happen and how to solve it?
> 
> Is this the kind of problem that Sqoop2's cleaner architecture intends to
> solve?
> 
> I use CDH 4.2, BTW.
> 
> Thanks :) !
> 
> --
> Felix

Mime
View raw message