hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiran chitturi <chitturikira...@gmail.com>
Subject Inconsistent row count between mapreduce and shell count
Date Sun, 10 Feb 2013 00:14:54 GMT

I am using Hbase 0.94.1 version over a distributed cluster of 20 nodes.

When i execute hbase count over a table in a shell, i got the count of
2152416 rows.

When i did the same thing using the rowcounter mapreduce, i got the value
as below

13/02/10 00:05:06 INFO mapred.JobClient:     ROWS=1389991

Same thing happened when i used pig to count or do operations. There is
inconsistency between both the results.

During the mapreduce, i have noticed that there are 5 tasks that are
killed. When i tried to trace back to the tasktracker logs of the node it
shows similar to below log.

2013-02-09_23:58:58.40665 13/02/09 23:58:58 INFO mapred.TaskTracker: JVM
with ID: jvm_201302090035_0015_m_1905604998 given task:
2013-02-09_23:59:03.57016 13/02/09 23:59:03 INFO mapred.TaskTracker:
Received KillTaskAction for task: attempt_201302090035_0015_m_000012_1
2013-02-09_23:59:03.57034 13/02/09 23:59:03 INFO mapred.TaskTracker: About
to purge task: attempt_201302090035_0015_m_000012_1
2013-02-09_23:59:03.61003 13/02/09 23:59:03 INFO util.ProcessTree: Killing
process group9745 with signal TERM. Exit code 0

I have also tried to run the tool 'hbck' but it shows no inconsistencies.

Can you please suggest me why there is inconsistency and how can i correct
it ?

Kiran Chitturi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message