sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Baxter <joshuagbax...@gmail.com>
Subject Re: Using more than a single mapper per partition with OraOop
Date Mon, 03 Nov 2014 21:53:12 GMT
We will mostly be wanting to bring in a single partition at a time, but
there will also be occasions where would we need to pull down the whole

sqoop import  -Doraoop.import.hint="no_parallel"
-Doraoop.chunk.method=PARTITION -Doraoop.timestamp.string=false
-Doraoop.import.partitions=partition_name  --connect connect_string
 --table "WAREHOUSE.BIG_TABLE" --fetch-size  100000 -m 20 --target-dir
/user/hive/warehouse/database/partition   --as-parquetfile --username user
--password password

On Mon, Nov 3, 2014 at 9:40 PM, Gwen Shapira <gshapira@cloudera.com> wrote:

> Do you need to get just one partition, or is the ultimate goal to use all
> partitions?
> Also, can you share the exact Oraoop command you used?
> On Mon, Nov 3, 2014 at 1:32 PM, Joshua Baxter <joshuagbaxter@gmail.com>
> wrote:
>> Apologies if this question has been asked before.
>> I have a very large table in Oracle with hundreds of partitions and we
>> want to be able to import it to parquet in HDFS a partition at a time as
>> part of a ETL process. The table has evolved over time and there is not a
>> column that doesn't have significant skew meaning that mappers get very
>> uneven numbers when using the standard sqoop connector and split-by. Impala
>> is the target platform that the data is for so we also want to keep the
>> file sizes under the cluster block size to prevent remote streaming when we
>> use the data. I've just discovered OraOop and it sounds like this would be
>> exactly tool we would need to import the data in an efficient and
>> predictable way.
>> Unfortunately the problem i'm now having is that if i use the partition
>> option to choose just a single partition this always equates to exactly one
>> mapper. The sort of speed and output file sizes we are looking at would
>> equate to something like 40.
>> Are there any options i can set to increase the number of mappers when
>> pulling data from a single table partition?

View raw message