sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lizhanqiang@inspur.com" <lizhanqi...@inspur.com>
Subject the confusion of --split-by parameter
Date Tue, 09 Sep 2014 03:33:16 GMT
Hi,all.
   In sqoop we can specify the parameter --split-by,which can determine which field we will
use to split map recored.
But if the split field's data is skew.The workload between maps will be imbalance.I want to
know why sqoop does not use 
select count(*) from table/num-maps to determine each map's workload.As I know some other
base class of  DataDrivenDBInputFormat's
has the implementation of select count(*) from table/num-maps.Then why sqoop override this.


Mime
View raw message