sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alo alt <wget.n...@googlemail.com>
Subject Re: [sqoop-user] Error while sqoop-import
Date Wed, 01 Feb 2012 08:26:12 GMT
Hi,

sqoop import-all-tables (generic-args) (import-args)

- Alex



--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 7:34 AM, bejoy ks wrote:

> This user group for SQOOP is deprecated moving the conversation to Apache Sqoop user
group
> 
> Hi Bhavesh
>             When you do import all tables, SQOOP does the default import with the PK
column in RDBMS as the --split-by column. In most cases in RDBMS you have a non numeric PK
or a composite Key as your PK, in those circumstances you have to individually import tables
from rdbms using appropriate --split-by column for each table.
> In your case I believe AddressUid from one table is a non numeric  field and hence it
is causing issues with the boundary SQL fired from SQOOP. (you need to specify --split-by)
So as the bottom line you need to issue individual import statements for all tables to be
imported. Club those in a shell script and fire the script if want a single point trigger.
> 
> Hope it helps!...
> 
> Regards
> Bejoy.K.S
> 
> 
> On Wed, Feb 1, 2012 at 11:44 AM, Bhavesh Shah <bhavesh25shah@gmail.com> wrote:
> Hello Bejoy K S,
> When I am trying for import for single table I executed that one with --split-by option
 but when I tried for --sqoop-import-all-tables
> I got these errors:
> 
> 
> hadoop@ubuntu:~/sqoop-1.3.0-cdh3u1/bin$ ./sqoop-import-all-tables  --connect 'jdbc:sqlserver://192.168.1.1;username=abcd;password=12345;database=HadoopTest'
--hive-import
> 12/01/31 05:29:49 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output.
You can override
> 12/01/31 05:29:49 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
> 12/01/31 05:29:49 INFO SqlServer.MSSQLServerManagerFactory: Using Microsoft's SQL Server
- Hadoop Connector
> 12/01/31 05:29:49 INFO manager.SqlManager: Using default fetchSize of 1000
> 12/01/31 05:29:49 INFO tool.CodeGenTool: Beginning code generation
> 12/01/31 05:29:50 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1 * FROM
[Address] 
> 12/01/31 05:29:50 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1 * FROM
[Address] 
> 12/01/31 05:29:50 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2-cdh3u2
> 12/01/31 05:29:51 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-hadoop/compile/6c3bc3e91230270163a8c4d1c5491071/Address.java
to /home/hadoop/sqoop-1.3.0-cdh3u1/bin/./Address.java
> 
> java.io.IOException: Destination '/home/hadoop/sqoop-1.3.0-cdh3u1/bin/./Address.java'
already exists
>     at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811)
>     at com.cloudera.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
>     at com.cloudera.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
>     at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:337)
>     at com.cloudera.sqoop.tool.ImportAllTablesTool.run(ImportAllTablesTool.java:64)
> 
>     at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
>     at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
>     at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
>     at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)
> 12/01/31 05:29:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/6c3bc3e91230270163a8c4d1c5491071/Address.jar
> 12/01/31 05:29:51 INFO mapreduce.ImportJobBase: Beginning import of Address
> 12/01/31 05:29:51 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1 * FROM
[Address] 
> 12/01/31 05:29:52 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([AddressUid]),
MAX([AddressUid]) FROM [Address]
> 12/01/31 05:29:52 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/home/hadoop/hadoopTemp/mapred/staging/hadoop/.staging/job_201201301559_0287
> 12/01/31 05:29:52 ERROR tool.ImportAllTablesTool: Encountered IOException running import
job: java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: Operand data type
uniqueidentifier is invalid for min operator.
> 
> -Thanks
> Bhavesh Shah
> 
> 
> 
> On Wed, Feb 1, 2012 at 10:51 AM, Bhavesh Shah <bhavesh25shah@gmail.com> wrote:
> Hello bejoy ks,
> After trying the import with --split-by any NUMERIC column other than AddressUid? ,
> it get successfully worked but I have to import all tables from database, 
> so in such case how to manage this?
> 
> -Thanks
> Bhavesh Shah
> 
> 
> On Tue, Jan 31, 2012 at 7:10 PM, bejoy ks <bejoyks@gmail.com> wrote:
> Hi Bhavesh
>           Did you try the import with --split-by any NUMERIC column other than AddressUid?

> Getting min and max values is necessary to divide the task between multiple mappers.
Say you have n different values for --split-by column, age. Say the min value is 18 and max
value is 58 and you have 4 mappers doing the import in parallel. the individual queries executed
by each mapper would be
> Select * from table_name where age between 18 and 27;
> Select * from table_name where age between 28 and 37;
> Select * from table_name where age between 38 and 47;
> Select * from table_name where age between 48 and 58;
> 
> So that the combined result would be Select * from table_name; the ultimate goal
> 
> Min value and max value is need to get the total boundary and then dividing the same
by number of mappers would get the scope of individual mapper.
> 
> 
> 
> On Tue, Jan 31, 2012 at 6:52 PM, Bhavesh Shah <bhavesh25shah@gmail.com> wrote:
> Hi Bejoy KS,
> Thanks for your reply,
> I have tried for split-by but the error is same as previous one. But I dont understand
one thing as:
> 
> 12/01/31 01:22:04 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AddressUid),
MAX(AddressUid) FROM Address  
> Why it is there while importing? 
> Actually I want to try for sqoop-import-all-tables, but I have uniqueidentifier in all
tables in SQL.
> So what could be the solution for this?
> 
> Thanks
> 
> 
> 
> 
> On Tue, Jan 31, 2012 at 6:44 PM, bejoy ks <bejoyks@gmail.com> wrote:
> Hi Bhavesh
>        Can you try the import with --split-by <column name> argument as well. Try
giving a few columns that has numeric datatypes. If it succeeds chose the column where the
data distribution would be uniform across mappers. The import can be something like this
> 
> ./sqoop-import --driver 'com.microsoft.sqlserver.jdbc.SQLServerDriver' --connect 'jdbc:sqlserver://192.168.1.1;username=abcd;password=12345;database=HadoopTest'
--table Address  --split-by someColumn  --hive-import --hive-table address --target-dir /userdata/somedir/dir/table
> 
> 
> On Tue, Jan 31, 2012 at 5:50 PM, Bhavesh Shah <bhavesh25shah@gmail.com> wrote:
> Hello,
> I am trying for sqoop-import in hive from SQL Server and I am getting error as:
> 
> ./sqoop-import --driver 'com.microsoft.sqlserver.jdbc.
> SQLServerDriver' --connect 'jdbc:sqlserver://192.168.1.1;username=abcd;password=12345;database=HadoopTest'
--table Address --hive-table address  --hive-import
> 12/01/31 01:22:01 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output.
You can override
> 12/01/31 01:22:01 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
> 12/01/31 01:22:01 INFO manager.SqlManager: Using default fetchSize of 1000
> 12/01/31 01:22:01 INFO tool.CodeGenTool: Beginning code generation
> 12/01/31 01:22:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM Address
AS t WHERE 1=0
> 12/01/31 01:22:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM Address
AS t WHERE 1=0
> 12/01/31 01:22:02 INFO orm.CompilationManager: HADOOP_HOME is /home/hadoop/hadoop-0.20.2-cdh3u2
> 12/01/31 01:22:03 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-hadoop/compile/0b20951314826ba92fce53b3fe95c464/Address.java
to /home/hadoop/sqoop-1.3.0-cdh3u1/bin/./Address.java
> java.io.IOException: Destination '/home/hadoop/sqoop-1.3.0-cdh3u1/bin/./Address.java'
already exists
>     at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811)
>     at com.cloudera.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
>     at com.cloudera.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
>     at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:337)
>     at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
>     at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
>     at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
>     at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
>     at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)
> 12/01/31 01:22:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/0b20951314826ba92fce53b3fe95c464/Address.jar
> 12/01/31 01:22:03 INFO mapreduce.ImportJobBase: Beginning import of Address
> 12/01/31 01:22:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM Address
AS t WHERE 1=0
> 12/01/31 01:22:04 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(AddressUid),
MAX(AddressUid) FROM Address  
> 12/01/31 01:22:04 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/home/hadoop/hadoopTemp/mapred/staging/hadoop/.staging/job_201201301559_0274
> 12/01/31 01:22:04 ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: Operand data type uniqueidentifier
is invalid for min operator.
>     at com.cloudera.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:204)
>     at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
>     at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
>     at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
>     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>     at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
>     at com.cloudera.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:121)
>     at com.cloudera.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:181)
>     at com.cloudera.sqoop.manager.SqlManager.importTable(SqlManager.java:405)
>     at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:350)
>     at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
>     at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
>     at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
>     at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
>     at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)
> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Operand data type uniqueidentifier
is invalid for min operator.
>     at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197)
>     at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1493)
>     at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:775)
>     at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:676)
>     at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4575)
>     at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1400)
>     at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:179)
>     at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:154)
>     at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:611)
>     at com.cloudera.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:189)
>     ... 22 more
> 
> Pls suggest me some solution
> 
> 
> -- 
> Thanks and Regards,
> Bhavesh Shah
> 
> 
> -- 
> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop
mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to
incubator-sqoop-user-subscribe@apache.org.
> 
> 
> 
> -- 
> Regards
>        Bejoy
> 
> -- 
> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop
mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to
incubator-sqoop-user-subscribe@apache.org.
> 
> 
> 
> -- 
> Regards,
> Bhavesh Shah
> 
> 
> -- 
> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop
mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to
incubator-sqoop-user-subscribe@apache.org.
> 
> 
> 
> -- 
> Regards
>        Bejoy
> 
> -- 
> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop
mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to
incubator-sqoop-user-subscribe@apache.org.
> 
> 
> 
> 
> -- 
> Regards,
> Bhavesh Shah
> 
> 
> -- 
> NOTE: The mailing list sqoop-user@cloudera.org is deprecated in favor of Apache Sqoop
mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to
incubator-sqoop-user-subscribe@apache.org.
> 
> 
> 
> -- 
> Regards
>        Bejoy


Mime
View raw message