Are you having mapper failures? I noticed one of the mapper outputs you put up shows a different region location. On Tue, Mar 7, 2017 at 9:00 PM, 陈 建平Chen Jianping wrote: > Hi group, > > Recently I met with a problem that there is duplicated data scanned from > different region in HBase and all this data shares the same row key and the > same value. > > Here is my case, I am using Cloudera CDH 5.9.0 with Hadoop 2.6.0 and HBase > 1.2.0, and the HBase client lib is also 1.2.0. There is a HBase table which > is auto-split and my Mapper (in MapReduce task) is try to scan this table > to get the data. However, some duplicated records are retrieved from > Scanner from different region and region server as follows. > > Is there any suggestion on this problem? Thanks in advance. > > Here is my code of scanner > Scan scan = new Scan(); > scan.setBatch(200); > scan.setCacheBlocks(false); > scan.setMaxVersions(1); > > > -----------MapReduce task log--------- > mapper001 > 2017-03-07 10:19:30,997 INFO [main] org.apache.hadoop.mapred.YarnChild: > mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/ > hdfs/appcache/application_1488785087512_0993,/data/3/ > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/ > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993 > 2017-03-07 10:19:31,333 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: > session.id is deprecated. Instead, use dfs.metrics.session-id > 2017-03-07 10:19:32,910 INFO [main] org.apache.hadoop.mapreduce. > lib.output.FileOutputCommitter: File Output Committer Algorithm version > is 1 > 2017-03-07 10:19:32,922 INFO [main] org.apache.hadoop.mapred.Task: Using > ResourceCalculatorProcessTree : [ ] > 2017-03-07 10:19:34,160 INFO [main] org.apache.hadoop.mapred.MapTask: > Processing split: HBase table split(table name: user_session, scan: , start > row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-21.company.co, > encoded region name: ) > > mapper002 > 2017-03-07 10:19:24,001 INFO [main] org.apache.hadoop.mapred.YarnChild: > mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/ > hdfs/appcache/application_1488785087512_0993,/data/3/ > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/ > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993 > 2017-03-07 10:19:24,618 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: > session.id is deprecated. Instead, use dfs.metrics.session-id > 2017-03-07 10:19:25,661 INFO [main] org.apache.hadoop.mapreduce. > lib.output.FileOutputCommitter: File Output Committer Algorithm version > is 1 > 2017-03-07 10:19:25,726 INFO [main] org.apache.hadoop.mapred.Task: Using > ResourceCalculatorProcessTree : [ ] > 2017-03-07 10:19:26,100 INFO [main] org.apache.hadoop.mapred.MapTask: > Processing split: HBase table split(table name: user_session, scan: , start > row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co, > encoded region name: ) > > mapper003 > 2017-03-07 10:19:24,278 INFO [main] org.apache.hadoop.mapred.YarnChild: > mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/ > hdfs/appcache/application_1488785087512_0993,/data/3/ > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/ > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993 > 2017-03-07 10:19:24,621 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: > session.id is deprecated. Instead, use dfs.metrics.session-id > 2017-03-07 10:19:25,553 INFO [main] org.apache.hadoop.mapreduce. > lib.output.FileOutputCommitter: File Output Committer Algorithm version > is 1 > 2017-03-07 10:19:25,566 INFO [main] org.apache.hadoop.mapred.Task: Using > ResourceCalculatorProcessTree : [ ] > 2017-03-07 10:19:25,910 INFO [main] org.apache.hadoop.mapred.MapTask: > Processing split: HBase table split(table name: user_session, scan: , start > row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co, > encoded region name: ) > > mapper004 > 2017-03-07 10:19:23,108 INFO [main] org.apache.hadoop.mapred.YarnChild: > mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/ > hdfs/appcache/application_1488785087512_0993,/data/1/ > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/3/ > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993 > 2017-03-07 10:19:23,413 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: > session.id is deprecated. Instead, use dfs.metrics.session-id > 2017-03-07 10:19:23,952 INFO [main] org.apache.hadoop.mapreduce. > lib.output.FileOutputCommitter: File Output Committer Algorithm version > is 1 > 2017-03-07 10:19:23,963 INFO [main] org.apache.hadoop.mapred.Task: Using > ResourceCalculatorProcessTree : [ ] > 2017-03-07 10:19:24,320 INFO [main] org.apache.hadoop.mapred.MapTask: > Processing split: HBase table split(table name: user_session, scan: , start > row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co, > encoded region name: ) > > > Thanks, > Eric > >