sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Martin Pampliega <jpampli...@gmail.com>
Subject Re: Need Suggestions to to sqoop import fastly
Date Thu, 12 Mar 2015 17:59:00 GMT
If you are using InnoDB you can use something like:

SELECT table_rows FROM INFORMATION_SCHEMA.TABLES
  WHERE table_schema = 'db_name' AND table_name LIKE 'name_of_table';

On Thu, Mar 12, 2015 at 2:53 PM, Abraham Elmahrek <abe@cloudera.com> wrote:

> Hey Syed,
>
> Sqoop has to boot a MR job in order to do the data transfer. This takes
> some time. As such, would the following work?
>
> #!/bin/bash
>
> [[ $( mysql test -e "SELECT COUNT(*) FROM test" | tail -1 ) -gt 0 ]] &&
> sqoop import ...
> The COUNT statement should be lightning fast if you're using MyISAM as
> your storage engine.
>
> -Abe
>
> On Thu, Mar 12, 2015 at 5:57 AM, Syed Akram <akram.basha@zohocorp.com>
> wrote:
>
>>
>> Hi,
>>
>> I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive
>>
>> I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and
>> in each db it has at least 600 tables(mixture of big and small/empty
>> tables).
>>
>> When I'm importing big tables, The performance is quite good.
>>
>> But When i'm trying to do sqoop import  small tables ( i say empty tables
>> with 0 records) is taking at least 20 secs of time for each table.
>>
>> *1.How can i reduce this time for small tables?*
>>
>> *my sqoop import query looks like this:*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *                                             sqoop "import",
>> "--connect", uri,  "--query", sqlText, "--map-column-java",
>> "oprtype=Integer",  "--target-dir", targetDir, "--hive-import",
>> "--hive-table", hiveTable, "--username", userName,  "--password",
>> password,  "--split-by", primaryKey,  "--num-mappers","2",
>> "--boundary-query",boundaryQry, "--hive-overwrite",
>> "--class-name",tableName, "--outdir", "tmp_sqoop/"+tableNamewhere "--query"
>> is "select tableName.*, oprtype as 0, modified_time as 0 where
>> $CONDITIONS""--split-by" primarykey"--boundary-query" select
>> min(primarykey), max(primarykey) from table;This runs fine for big table
>> having even billions of rows.But for small table, iam noticing constant
>> time taking to do sqoop import.How do i optimize the things for small
>> tables or tables with 0 records. I want to reduce the latency for small
>> tables.Please suggest me in this area,Cheers!!!!*
>>
>>
>>
>

Mime
View raw message