hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 陈 建平Chen Jianping <chenjianp...@agora.io>
Subject Re: Problem: Duplicate data is scanned from different region in HBase 1.2.0
Date Thu, 09 Mar 2017 03:33:41 GMT
The mapper run successfully. Finally I find my bug in getSplits() in my customized TableInputFormat
which scan on auto-split region.
Thank you all.


On 08/03/2017, 10:26, "Hbase Janitor" <hbasejanitor@gmail.com> wrote:

    Are you having mapper failures?
    
    I noticed one of the mapper outputs you put up shows a different region
    location.
    
    On Tue, Mar 7, 2017 at 9:00 PM, 陈 建平Chen Jianping <chenjianping@agora.io>
    wrote:
    
    > Hi group,
    >
    > Recently I met with a problem that there is duplicated data scanned from
    > different region in HBase and all this data shares the same row key and the
    > same value.
    >
    > Here is my case, I am using Cloudera CDH 5.9.0 with Hadoop 2.6.0 and HBase
    > 1.2.0, and the HBase client lib is also 1.2.0. There is a HBase table which
    > is auto-split and my Mapper (in MapReduce task) is try to scan this table
    > to get the data. However, some duplicated records are retrieved from
    > Scanner from different region and region server as follows.
    >
    > Is there any suggestion on this problem? Thanks in advance.
    >
    > Here is my code of scanner
    > Scan scan = new Scan();
    >     scan.setBatch(200);
    >     scan.setCacheBlocks(false);
    >     scan.setMaxVersions(1);
    >
    >
    > -----------MapReduce task log---------
    > mapper001
    > 2017-03-07 10:19:30,997 INFO [main] org.apache.hadoop.mapred.YarnChild:
    > mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
    > hdfs/appcache/application_1488785087512_0993,/data/3/
    > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/
    > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
    > 2017-03-07 10:19:31,333 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
    > session.id is deprecated. Instead, use dfs.metrics.session-id
    > 2017-03-07 10:19:32,910 INFO [main] org.apache.hadoop.mapreduce.
    > lib.output.FileOutputCommitter: File Output Committer Algorithm version
    > is 1
    > 2017-03-07 10:19:32,922 INFO [main] org.apache.hadoop.mapred.Task:  Using
    > ResourceCalculatorProcessTree : [ ]
    > 2017-03-07 10:19:34,160 INFO [main] org.apache.hadoop.mapred.MapTask:
    > Processing split: HBase table split(table name: user_session, scan: , start
    > row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-21.company.co,
    > encoded region name: )
    >
    > mapper002
    > 2017-03-07 10:19:24,001 INFO [main] org.apache.hadoop.mapred.YarnChild:
    > mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
    > hdfs/appcache/application_1488785087512_0993,/data/3/
    > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/
    > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
    > 2017-03-07 10:19:24,618 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
    > session.id is deprecated. Instead, use dfs.metrics.session-id
    > 2017-03-07 10:19:25,661 INFO [main] org.apache.hadoop.mapreduce.
    > lib.output.FileOutputCommitter: File Output Committer Algorithm version
    > is 1
    > 2017-03-07 10:19:25,726 INFO [main] org.apache.hadoop.mapred.Task:  Using
    > ResourceCalculatorProcessTree : [ ]
    > 2017-03-07 10:19:26,100 INFO [main] org.apache.hadoop.mapred.MapTask:
    > Processing split: HBase table split(table name: user_session, scan: , start
    > row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co,
    > encoded region name: )
    >
    > mapper003
    > 2017-03-07 10:19:24,278 INFO [main] org.apache.hadoop.mapred.YarnChild:
    > mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
    > hdfs/appcache/application_1488785087512_0993,/data/3/
    > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/1/
    > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
    > 2017-03-07 10:19:24,621 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
    > session.id is deprecated. Instead, use dfs.metrics.session-id
    > 2017-03-07 10:19:25,553 INFO [main] org.apache.hadoop.mapreduce.
    > lib.output.FileOutputCommitter: File Output Committer Algorithm version
    > is 1
    > 2017-03-07 10:19:25,566 INFO [main] org.apache.hadoop.mapred.Task:  Using
    > ResourceCalculatorProcessTree : [ ]
    > 2017-03-07 10:19:25,910 INFO [main] org.apache.hadoop.mapred.MapTask:
    > Processing split: HBase table split(table name: user_session, scan: , start
    > row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co,
    > encoded region name: )
    >
    > mapper004
    > 2017-03-07 10:19:23,108 INFO [main] org.apache.hadoop.mapred.YarnChild:
    > mapreduce.cluster.local.dir for child: /data/2/yarn/nm/usercache/
    > hdfs/appcache/application_1488785087512_0993,/data/1/
    > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993,/data/3/
    > yarn/nm/usercache/hdfs/appcache/application_1488785087512_0993
    > 2017-03-07 10:19:23,413 INFO [main] org.apache.hadoop.conf.Configuration.deprecation:
    > session.id is deprecated. Instead, use dfs.metrics.session-id
    > 2017-03-07 10:19:23,952 INFO [main] org.apache.hadoop.mapreduce.
    > lib.output.FileOutputCommitter: File Output Committer Algorithm version
    > is 1
    > 2017-03-07 10:19:23,963 INFO [main] org.apache.hadoop.mapred.Task:  Using
    > ResourceCalculatorProcessTree : [ ]
    > 2017-03-07 10:19:24,320 INFO [main] org.apache.hadoop.mapred.MapTask:
    > Processing split: HBase table split(table name: user_session, scan: , start
    > row: 0X\xBDO@, end row: 0X\xBD)P, region location: ip-10-2-1-23.company.co,
    > encoded region name: )
    >
    >
    > Thanks,
    > Eric
    >
    >
    

Mime
View raw message