sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Syed Akram <akram.ba...@zohocorp.com>
Subject Need Suggestions to to sqoop import fastly
Date Thu, 12 Mar 2015 12:57:02 GMT

Hi,

I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive 


I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and in each db it has
at least 600 tables(mixture of big and small/empty tables).


When I'm importing big tables, The performance is quite good.


But When i'm trying to do sqoop import  small tables ( i say empty tables with 0 records)
is taking at least 20 secs of time for each table.


1.How can i reduce this time for small tables?


my sqoop import query looks like this:


                                             sqoop "import",
 "--connect", uri, 
 "--query", sqlText,
 "--map-column-java", 
 "oprtype=Integer", 
 "--target-dir", targetDir,
 "--hive-import",
 "--hive-table", hiveTable,
 "--username", userName, 
 "--password", password, 
 "--split-by", primaryKey, 
 "--num-mappers","2",
 "--boundary-query",boundaryQry,
 "--hive-overwrite",
 "--class-name",tableName,
 "--outdir", "tmp_sqoop/"+tableName


where "--query" is "select tableName.*, oprtype as 0, modified_time as 0 where $CONDITIONS"


"--split-by" primarykey
"--boundary-query" select min(primarykey), max(primarykey) from table;


This runs fine for big table having even billions of rows.


But for small table, iam noticing constant time taking to do sqoop import.


How do i optimize the things for small tables or tables with 0 records. I want to reduce the
latency for small tables.




Please suggest me in this area,




Cheers!!!!







Mime
View raw message