sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Cecho" <jar...@apache.org>
Subject Re: Review Request: SQOOP-603: Support small intervals in IntegerSplitter implementation
Date Thu, 20 Sep 2012 16:54:22 GMT


> On Sept. 20, 2012, 4:35 p.m., Cheolsoo Park wrote:
> > Hi Jarcec,
> > 
> > What if min = 0, max = 1, and numSplits = 5?
> > 
> > Following the split() function,
> > 
> > splitSize = (1 - 0) / 5 = 0;
> > remainder = (1 - 0) % 5 = 1;
> > 
> > After the for loop,
> > 
> > splits = (0, 1)
> > 
> > Now (maxVal - minVal) <= numSplits is true as (1 - 0) <= 5,
> > 
> > so we add maxVal to splits.
> > 
> > splits = (0, 1, 1)
> > 
> > so we end up with splits as follows:
> > 
> > [0, 1)
> > [1, 1) => redundant split that includes no values
> > [1, 1]
> > 
> > This case can happen if the user sets -m to a unnecessarily large number, can't
it?
> > 
> > Please correct me if I am wrong.
> > 
> > Thanks!

Hi sir,
you're right about output of the split function - it will be (0, 1, 1). However IntegerSplitter
will convert this list to following two splits:

* 0 <= x < 1
* 1 <= x <= 1

IntegerSplitter is always creating n - 1 splits based on list provided by split() method that
is in question. I've actually tested this scenario on real MySQL when I had only 5 values
in target table and I've requested 20 mappers - i did not end up with data duplicity.

Jarcec


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7193/#review11735
-----------------------------------------------------------


On Sept. 20, 2012, 3:37 p.m., Jarek Cecho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7193/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2012, 3:37 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> I've decided to alter method split() to add one maxVal in case that there is less or
equal split points then requested split count.
> 
> 
> This addresses bug SQOOP-603.
>     https://issues.apache.org/jira/browse/SQOOP-603
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/mapreduce/db/DataDrivenDBInputFormat.java 35b74eb 
>   src/java/org/apache/sqoop/mapreduce/db/IntegerSplitter.java 8e7a096 
>   src/test/org/apache/sqoop/mapreduce/db/TestIntegerSplitter.java 22d5140 
> 
> Diff: https://reviews.apache.org/r/7193/diff/
> 
> 
> Testing
> -------
> 
> * ant test
> * Real MySQL instance in couple of scenarios
> 
> 
> Thanks,
> 
> Jarek Cecho
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message