Hey David,
Here's the algorithm:
Split lengths are defined by (max  min)/(# mappers) and whatever is left
is tacked on at the end. So in this case, (2882721912110)/3 =
96090027.33... So I'm assuming the .33... is rounded down and split lengths
will be of length 96090027. Sqoop will then create splits with the
following points: (min) + (range length)*(n). We can see that 2110 + 96090027*0
= 2110, 2110 + 96090027*1 = 96092137, 2110 + 96090027*2 = 192182164, and 2110
+ 96090027*3 = 288272191 will be generated based off of this algorithm. The
last point to be added will be 288272192 because the max value is not part
of the generated split points. Then sqoop will distributed accordingly
based off of these points as you've pointed out above.
Just to be sure, did you configure sqoop to use 3 mappers?
Hope this helps,
Abe
On Wed, Jun 19, 2013 at 8:33 AM, David Kincaid <kincaid.dave@gmail.com>wrote:
> We're seeing a strange thing happen with a sqoop import job with the way
> the key range is getting distributed among the 4 mappers that are running.
> The minimum key value is 2110 and the maximum value is 288272191. We are
> getting one mapper that is only getting one record to import. Here is the
> distribution among the mappers:
>
> [2110, 96092137)
> [96092137, 192182164)
> [192182164, 288272191)
> [288272191, 288272192)
>
> you can see that the fourth mapper is given a range with only one value in
> it. Could someone help me understand what is going on?
>
> Thanks,
>
> Dave
>
