sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: Need Suggestions to to sqoop import fastly
Date Thu, 12 Mar 2015 17:53:39 GMT
Hey Syed,

Sqoop has to boot a MR job in order to do the data transfer. This takes
some time. As such, would the following work?

#!/bin/bash

[[ $( mysql test -e "SELECT COUNT(*) FROM test" | tail -1 ) -gt 0 ]] &&
sqoop import ...
The COUNT statement should be lightning fast if you're using MyISAM as your
storage engine.

-Abe

On Thu, Mar 12, 2015 at 5:57 AM, Syed Akram <akram.basha@zohocorp.com>
wrote:

>
> Hi,
>
> I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive
>
> I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and
> in each db it has at least 600 tables(mixture of big and small/empty
> tables).
>
> When I'm importing big tables, The performance is quite good.
>
> But When i'm trying to do sqoop import  small tables ( i say empty tables
> with 0 records) is taking at least 20 secs of time for each table.
>
> *1.How can i reduce this time for small tables?*
>
> *my sqoop import query looks like this:*
>
>
>
>
>
>
>
>
>
>
> *                                             sqoop "import", "--connect",
> uri,  "--query", sqlText, "--map-column-java",  "oprtype=Integer",
> "--target-dir", targetDir, "--hive-import", "--hive-table", hiveTable,
> "--username", userName,  "--password", password,  "--split-by",
> primaryKey,  "--num-mappers","2", "--boundary-query",boundaryQry,
> "--hive-overwrite", "--class-name",tableName, "--outdir",
> "tmp_sqoop/"+tableNamewhere "--query" is "select tableName.*, oprtype as 0,
> modified_time as 0 where $CONDITIONS""--split-by"
> primarykey"--boundary-query" select min(primarykey), max(primarykey) from
> table;This runs fine for big table having even billions of rows.But for
> small table, iam noticing constant time taking to do sqoop import.How do i
> optimize the things for small tables or tables with 0 records. I want to
> reduce the latency for small tables.Please suggest me in this
> area,Cheers!!!!*
>
>
>

Mime
View raw message