sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joshua Baxter <joshuagbax...@gmail.com>
Subject Using more than a single mapper per partition with OraOop
Date Mon, 03 Nov 2014 21:32:31 GMT
Apologies if this question has been asked before.

I have a very large table in Oracle with hundreds of partitions and we want
to be able to import it to parquet in HDFS a partition at a time as part of
a ETL process. The table has evolved over time and there is not a column
that doesn't have significant skew meaning that mappers get very uneven
numbers when using the standard sqoop connector and split-by. Impala is the
target platform that the data is for so we also want to keep the file sizes
under the cluster block size to prevent remote streaming when we use the
data. I've just discovered OraOop and it sounds like this would be exactly
tool we would need to import the data in an efficient and predictable way.

Unfortunately the problem i'm now having is that if i use the partition
option to choose just a single partition this always equates to exactly one
mapper. The sort of speed and output file sizes we are looking at would
equate to something like 40.

Are there any options i can set to increase the number of mappers when
pulling data from a single table partition?

View raw message