sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sreejesh s <sreejesh...@yahoo.com>
Subject Avoiding skew and determining optimal number of mappers in SQOOP import.
Date Sun, 21 Jun 2015 08:04:57 GMT
 Hi, If there is a primary key on the source table, SQOOP import would generate no skewed
data... What if there is no primary key defined on the table and we have to use --split-by
parameter to split records among multiple mappers. There are high chances of skewed data
depending on the column we select to --split-by. Could you please help me understand how
to avoid skewing in such scenarios and also how to determine the optimal number of mappers
to be used for any SQOOP import.
It helps if you can explain how many mappers you have used in your use case along with the
size and format of data imported.. 

View raw message