sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: Re: the confusion of --split-by parameter
Date Wed, 10 Sep 2014 01:59:30 GMT
Good point. The only thing I can think of is that offsets might be slower
(since the DB has to scan and keep a count internally) and the expectation
that certain ranges of data end up in certain files (though I doubt this
one). I'll defer this one to the broader community as I'm not sure myself.

On Tue, Sep 9, 2014 at 5:31 PM, lizhanqiang@inspur.com <
lizhanqiang@inspur.com> wrote:

> Hey,brother.
>   Glad to hear from you!I think we can use limit/offset(if the database
> support this operation),or we can use sub-selection(if the database does
> not support limint/offset)
> For example:
> For MySQL:select * from table limiit 0,5;select * from table limit 6,10...
> For Oracle we can use rownum
> I just can not understand why sqoop override this opreation above.This
> override can lead to data skew.
> *From:* Abraham Elmahrek <abe@cloudera.com>
> *Date:* 2014-09-10 00:38
> *To:* user@sqoop.apache.org
> *Subject:* Re: the confusion of --split-by parameter
> Hey there,
> For databases, there needs to be a way to actually infer boundaries for a
> particular column. Simply performing a "select *" would not be enough
> because we would not know how to query the database.
> -Abe
> On Mon, Sep 8, 2014 at 8:33 PM, lizhanqiang@inspur.com <
> lizhanqiang@inspur.com> wrote:
>> Hi,all.
>>    In sqoop we can specify the parameter --split-by,which can determine
>> which field we will use to split map recored.
>> But if the split field's data is skew.The workload between maps will be imbalance.I
>> want to know why sqoop does not use
>> select count(*) from table/num-maps to determine each map's workload.As I
>> know some other base class of  DataDrivenDBInputFormat's
>> has the implementation of select count(*) from table/num-maps.Then why
>> sqoop override this.

View raw message