The mapper run successfully. Finally I find my bug in getSplits() in my customized TableInputFormat
which scan on auto-split region.
Thank you all.
On 08/03/2017, 10:26, "Hbase Janitor" <hbasejanitor@gmail.com> wrote:
Are you having mapper failures?
I noticed one of the mapper outputs you put up shows a different region
location.
On Tue, Mar 7, 2017 at 9:00 PM, 陈 建平Chen Jianping <chenjianping@agora.io>
wrote:
> Hi group,
>
> Recently I met with a problem that there is duplicated data scanned from
> different region in HBase and all this data shares the same row key and the
> same value.
>
> Here is my case, I am using Cloudera CDH 5.9.0 with Hadoop 2.6.0 and HBase
> 1.2.0, and the HBase client lib is also 1.2.0. There is a HBase table which
> is auto-split and my Mapper (in MapReduce task) is try to scan this table
> to get the data. However, some duplicated records are retrieved from
> Scanner from different region and region server as follows.
>
> Is there any suggestion on this problem? Thanks in advance.
>
> Here is my code of scanner
> Scan scan = new Scan();
> scan.setBatch(200);
> scan.setCacheBlocks(false);
> scan.setMaxVersions(1);
>
>
> -----------MapReduce task log---------
> mapper001
> 2017-03-07 10:19:30,997 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
> hdfs/appcache/application_1488785087512_0993,/data/3/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
> 2017-03-07 10:19:31,333 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
> session.id is deprecated. Instead, use dfs.metrics.session-id
> 2017-03-07 10:19:32,910 INFO [main] org.apache.hadoop.mapreduce.
> lib.output.FileOutputCommitter: File Output Committer Algorithm version
> is 1
> 2017-03-07 10:19:32,922 INFO [main] org.apache.hadoop.mapred.Task: Using
> ResourceCalculatorProcessTree : [ ]
> 2017-03-07 10:19:34,160 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: HBase table split(table name: user_session, scan: , start
> row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-21.company.co,
> encoded region name: )
>
> mapper002
> 2017-03-07 10:19:24,001 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
> hdfs/appcache/application_1488785087512_0993,/data/3/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
> 2017-03-07 10:19:24,618 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
> session.id is deprecated. Instead, use dfs.metrics.session-id
> 2017-03-07 10:19:25,661 INFO [main] org.apache.hadoop.mapreduce.
> lib.output.FileOutputCommitter: File Output Committer Algorithm version
> is 1
> 2017-03-07 10:19:25,726 INFO [main] org.apache.hadoop.mapred.Task: Using
> ResourceCalculatorProcessTree : [ ]
> 2017-03-07 10:19:26,100 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: HBase table split(table name: user_session, scan: , start
> row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co,
> encoded region name: )
>
> mapper003
> 2017-03-07 10:19:24,278 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
> hdfs/appcache/application_1488785087512_0993,/data/3/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
> 2017-03-07 10:19:24,621 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
> session.id is deprecated. Instead, use dfs.metrics.session-id
> 2017-03-07 10:19:25,553 INFO [main] org.apache.hadoop.mapreduce.
> lib.output.FileOutputCommitter: File Output Committer Algorithm version
> is 1
> 2017-03-07 10:19:25,566 INFO [main] org.apache.hadoop.mapred.Task: Using
> ResourceCalculatorProcessTree : [ ]
> 2017-03-07 10:19:25,910 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: HBase table split(table name: user_session, scan: , start
> row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co,
> encoded region name: )
>
> mapper004
> 2017-03-07 10:19:23,108 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
> hdfs/appcache/application_1488785087512_0993,/data/1/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/3/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
> 2017-03-07 10:19:23,413 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
> session.id is deprecated. Instead, use dfs.metrics.session-id
> 2017-03-07 10:19:23,952 INFO [main] org.apache.hadoop.mapreduce.
> lib.output.FileOutputCommitter: File Output Committer Algorithm version
> is 1
> 2017-03-07 10:19:23,963 INFO [main] org.apache.hadoop.mapred.Task: Using
> ResourceCalculatorProcessTree : [ ]
> 2017-03-07 10:19:24,320 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: HBase table split(table name: user_session, scan: , start
> row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co,
> encoded region name: )
>
>
> Thanks,
> Eric
>
>
|