hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hbase Janitor <hbasejani...@gmail.com>
Subject Re: Problem: Duplicate data is scanned from different region in HBase 1.2.0
Date Wed, 08 Mar 2017 02:26:34 GMT
Are you having mapper failures?

I noticed one of the mapper outputs you put up shows a different region
location.

On Tue, Mar 7, 2017 at 9:00 PM, 陈 建平Chen Jianping <chenjianping@agora.io>
wrote:

> Hi group,
>
> Recently I met with a problem that there is duplicated data scanned from
> different region in HBase and all this data shares the same row key and the
> same value.
>
> Here is my case, I am using Cloudera CDH 5.9.0 with Hadoop 2.6.0 and HBase
> 1.2.0, and the HBase client lib is also 1.2.0. There is a HBase table which
> is auto-split and my Mapper (in MapReduce task) is try to scan this table
> to get the data. However, some duplicated records are retrieved from
> Scanner from different region and region server as follows.
>
> Is there any suggestion on this problem? Thanks in advance.
>
> Here is my code of scanner
> Scan scan = new Scan();
>     scan.setBatch(200);
>     scan.setCacheBlocks(false);
>     scan.setMaxVersions(1);
>
>
> -----------MapReduce task log---------
> mapper001
> 2017-03-07 10:19:30,997 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
> hdfs/appcache/application_1488785087512_0993,/data/3/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
> 2017-03-07 10:19:31,333 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
> session.id is deprecated. Instead, use dfs.metrics.session-id
> 2017-03-07 10:19:32,910 INFO [main] org.apache.hadoop.mapreduce.
> lib.output.FileOutputCommitter: File Output Committer Algorithm version
> is 1
> 2017-03-07 10:19:32,922 INFO [main] org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorProcessTree : [ ]
> 2017-03-07 10:19:34,160 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: HBase table split(table name: user_session, scan: , start
> row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-21.company.co,
> encoded region name: )
>
> mapper002
> 2017-03-07 10:19:24,001 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
> hdfs/appcache/application_1488785087512_0993,/data/3/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
> 2017-03-07 10:19:24,618 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
> session.id is deprecated. Instead, use dfs.metrics.session-id
> 2017-03-07 10:19:25,661 INFO [main] org.apache.hadoop.mapreduce.
> lib.output.FileOutputCommitter: File Output Committer Algorithm version
> is 1
> 2017-03-07 10:19:25,726 INFO [main] org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorProcessTree : [ ]
> 2017-03-07 10:19:26,100 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: HBase table split(table name: user_session, scan: , start
> row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co,
> encoded region name: )
>
> mapper003
> 2017-03-07 10:19:24,278 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
> hdfs/appcache/application_1488785087512_0993,/data/3/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
> 2017-03-07 10:19:24,621 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
> session.id is deprecated. Instead, use dfs.metrics.session-id
> 2017-03-07 10:19:25,553 INFO [main] org.apache.hadoop.mapreduce.
> lib.output.FileOutputCommitter: File Output Committer Algorithm version
> is 1
> 2017-03-07 10:19:25,566 INFO [main] org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorProcessTree : [ ]
> 2017-03-07 10:19:25,910 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: HBase table split(table name: user_session, scan: , start
> row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co,
> encoded region name: )
>
> mapper004
> 2017-03-07 10:19:23,108 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
> hdfs/appcache/application_1488785087512_0993,/data/1/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/3/
> yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
> 2017-03-07 10:19:23,413 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
> session.id is deprecated. Instead, use dfs.metrics.session-id
> 2017-03-07 10:19:23,952 INFO [main] org.apache.hadoop.mapreduce.
> lib.output.FileOutputCommitter: File Output Committer Algorithm version
> is 1
> 2017-03-07 10:19:23,963 INFO [main] org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorProcessTree : [ ]
> 2017-03-07 10:19:24,320 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: HBase table split(table name: user_session, scan: , start
> row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co,
> encoded region name: )
>
>
> Thanks,
> Eric
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message