sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: the confusion of --split-by parameter
Date Tue, 09 Sep 2014 16:38:01 GMT
Hey there,

For databases, there needs to be a way to actually infer boundaries for a
particular column. Simply performing a "select *" would not be enough
because we would not know how to query the database.


On Mon, Sep 8, 2014 at 8:33 PM, lizhanqiang@inspur.com <
lizhanqiang@inspur.com> wrote:

> Hi,all.
>    In sqoop we can specify the parameter --split-by,which can determine
> which field we will use to split map recored.
> But if the split field's data is skew.The workload between maps will be imbalance.I
> want to know why sqoop does not use
> select count(*) from table/num-maps to determine each map's workload.As I
> know some other base class of  DataDrivenDBInputFormat's
> has the implementation of select count(*) from table/num-maps.Then why
> sqoop override this.

View raw message