sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: Need Suggestions to to sqoop import fastly
Date Fri, 13 Mar 2015 15:41:56 GMT
Syed,

This would depend on your resourcing heavily. How long are the longest
Sqoop jobs taking? If the longest Sqoop jobs are taking 2 days, then you'll
have to increase the number of mappers sqooping data from your database.
You can fiddle with the parallelization by adding the "-m" argument. Check
out
http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_controlling_parallelism
for more information.

-Abe

On Fri, Mar 13, 2015 at 7:40 AM, Syed Akram <akram.basha@zohocorp.com>
wrote:

> Thanks For your valuable response Abe and Martin, This helped me and i
> gave a try on that.
>
> Along with this,
>
> I need one more suggestion,
>
> one db with 600 tables taking 3 hours of time to do hive import using
> below one.
> Like this i have 200 db's and if i run sequentially 200 db's one by one it
> takes 25 days to complete one db cluster.
>
> But this is not the result i am expecting, i want to import the data (all
> the 200 db's) in 2 days(at most). How to make this happen,
>
> i.e., one DB Cluster with more than or equal to 200 db's should be done as
> soon as possible.
>
> Suggestions on this. I'm working on this for long time. suggestions are
> valuable.
>
>
> Cheers!!!!
>
> ---- On Thu, 12 Mar 2015 23:29:00 +0530 *Juan Martin
> Pampliega<jpampliega@gmail.com <jpampliega@gmail.com>>* wrote ----
>
> If you are using InnoDB you can use something like:
>
> SELECT table_rows FROM INFORMATION_SCHEMA.TABLES
>   WHERE table_schema = 'db_name' AND table_name LIKE 'name_of_table';
>
> On Thu, Mar 12, 2015 at 2:53 PM, Abraham Elmahrek <abe@cloudera.com>
> wrote:
>
> Hey Syed,
>
> Sqoop has to boot a MR job in order to do the data transfer. This takes
> some time. As such, would the following work?
>
> #!/bin/bash
>
> [[ $( mysql test -e "SELECT COUNT(*) FROM test" | tail -1 ) -gt 0 ]] &&
> sqoop import ...
> The COUNT statement should be lightning fast if you're using MyISAM as
> your storage engine.
>
> -Abe
>
> On Thu, Mar 12, 2015 at 5:57 AM, Syed Akram <akram.basha@zohocorp.com>
> wrote:
>
>
> Hi,
>
> I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive
>
> I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and
> in each db it has at least 600 tables(mixture of big and small/empty
> tables).
>
> When I'm importing big tables, The performance is quite good.
>
> But When i'm trying to do sqoop import  small tables ( i say empty tables
> with 0 records) is taking at least 20 secs of time for each table.
>
> *1.How can i reduce this time for small tables?*
>
> *my sqoop import query looks like this:*
>
>
>
>
>
>
>
>
>
>
> *                                             sqoop "import", "--connect",
> uri,  "--query", sqlText, "--map-column-java",  "oprtype=Integer",
> "--target-dir", targetDir, "--hive-import", "--hive-table", hiveTable,
> "--username", userName,  "--password", password,  "--split-by",
> primaryKey,  "--num-mappers","2", "--boundary-query",boundaryQry,
> "--hive-overwrite", "--class-name",tableName, "--outdir",
> "tmp_sqoop/"+tableNamewhere "--query" is "select tableName.*, oprtype as 0,
> modified_time as 0 where $CONDITIONS""--split-by"
> primarykey"--boundary-query" select min(primarykey), max(primarykey) from
> table;This runs fine for big table having even billions of rows.But for
> small table, iam noticing constant time taking to do sqoop import.How do i
> optimize the things for small tables or tables with 0 records. I want to
> reduce the latency for small tables.Please suggest me in this
> area,Cheers!!!!*
>
>
>
>
>
>
>

Mime
View raw message