hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From john smith <js1987.sm...@gmail.com>
Subject getSplits() in TableInputFormatBase
Date Sun, 11 Apr 2010 07:54:50 GMT
Hi ,

In the method  "public org.apache.hadoop.mapred.InputSplit[] *getSplits*
(org.apache.hadoop.mapred.JobConf job,

                                                       int numSplits) "

how is the "numSplits" decided ? I've seen differnt values of
numSplits for different MR jobs . Any reason for this ?

Also what if I ignore numsplits and always split at region
boundaries.I guess that , splitting at region boundaries makes more
sense and improves some what data locality.

Any comments on the above statement?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message