sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Syed Akram <akram.ba...@zohocorp.com>
Subject Need Suggestions to to sqoop import fastly
Date Thu, 12 Mar 2015 12:57:02 GMT


I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive 

I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and in each db it has
at least 600 tables(mixture of big and small/empty tables).

When I'm importing big tables, The performance is quite good.

But When i'm trying to do sqoop import  small tables ( i say empty tables with 0 records)
is taking at least 20 secs of time for each table.

1.How can i reduce this time for small tables?

my sqoop import query looks like this:

                                             sqoop "import",
 "--connect", uri, 
 "--query", sqlText,
 "--target-dir", targetDir,
 "--hive-table", hiveTable,
 "--username", userName, 
 "--password", password, 
 "--split-by", primaryKey, 
 "--outdir", "tmp_sqoop/"+tableName

where "--query" is "select tableName.*, oprtype as 0, modified_time as 0 where $CONDITIONS"

"--split-by" primarykey
"--boundary-query" select min(primarykey), max(primarykey) from table;

This runs fine for big table having even billions of rows.

But for small table, iam noticing constant time taking to do sqoop import.

How do i optimize the things for small tables or tables with 0 records. I want to reduce the
latency for small tables.

Please suggest me in this area,


View raw message