sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Syed Akram <akram.ba...@zohocorp.com>
Subject Re: Need Suggestions to to sqoop import fastly
Date Fri, 13 Mar 2015 14:40:32 GMT
Thanks For your valuable response Abe and Martin, This helped me and i gave a try on that.

Along with this, 

I need one more suggestion,

one db with 600 tables taking 3 hours of time to do hive import using below one.
Like this i have 200 db's and if i run sequentially 200 db's one by one it takes 25 days to
complete one db cluster.

But this is not the result i am expecting, i want to import the data (all the 200 db's) in
2 days(at most). How to make this happen,

i.e., one DB Cluster with more than or equal to 200 db's should be done as soon as possible.

Suggestions on this. I'm working on this for long time. suggestions are valuable. 


---- On Thu, 12 Mar 2015 23:29:00 +0530 Juan Martin Pampliega&lt;jpampliega@gmail.com&gt;
wrote ---- 

If you are using InnoDB you can use something like:

  WHERE table_schema = 'db_name' AND table_name LIKE 'name_of_table';

On Thu, Mar 12, 2015 at 2:53 PM, Abraham Elmahrek &lt;abe@cloudera.com&gt; wrote:
Hey Syed,

Sqoop has to boot a MR job in order to do the data transfer. This takes some time. As such,
would the following work?

 [[ $( mysql test -e "SELECT COUNT(*) FROM test" | tail -1 ) -gt 0 ]] &amp;&amp; sqoop
import ...

The COUNT statement should be lightning fast if you're using MyISAM as your storage engine.


On Thu, Mar 12, 2015 at 5:57 AM, Syed Akram &lt;akram.basha@zohocorp.com&gt; wrote:


I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive 

I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and in each db it has
at least 600 tables(mixture of big and small/empty tables).

When I'm importing big tables, The performance is quite good.

But When i'm trying to do sqoop import  small tables ( i say empty tables with 0 records)
is taking at least 20 secs of time for each table.

1.How can i reduce this time for small tables?

my sqoop import query looks like this:

                                             sqoop "import",
 "--connect", uri, 
 "--query", sqlText,
 "--target-dir", targetDir,
 "--hive-table", hiveTable,
 "--username", userName, 
 "--password", password, 
 "--split-by", primaryKey, 
 "--outdir", "tmp_sqoop/"+tableName

where "--query" is "select tableName.*, oprtype as 0, modified_time as 0 where $CONDITIONS"

"--split-by" primarykey
"--boundary-query" select min(primarykey), max(primarykey) from table;

This runs fine for big table having even billions of rows.

But for small table, iam noticing constant time taking to do sqoop import.

How do i optimize the things for small tables or tables with 0 records. I want to reduce the
latency for small tables.

Please suggest me in this area,




View raw message