hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: get the impact hbase brings to HDFS, datanode log exploded after we started HBase.
Date Thu, 08 Apr 2010 16:23:42 GMT
It'll depend on your access patterns but in general we'll be doing
lots of small accesses... many more.  A recently added clienttrace
log, in this case the client referred to is dfsclient, will log
messages like the following:

2010-04-07 22:15:52,078 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/10.20.20.189:50010, dest: /10.20.20.189:56736, bytes: 2022080, op:
HDFS_READ, cliID: DFSClient_-994492608, srvID:
DS-1740361948-10.20.20.189-50010-1270703663528, blockid:
blk_2797215769808904384_1015

Lots of them, one per access.

You could turn them off explicitly in your log4j.  That should help.

Don't run DEBUG level in datanode logs.

Other answers inlined below.

On Thu, Apr 8, 2010 at 2:51 AM, steven zhuang
<steven.zhuang.1984@gmail.com> wrote:
>...
>        At present, my idea is calculating the data IO quantity of both HDFS
> and HBase for a given day, and with the result we can have a rough estimate
> of the situation.

Can you use the above noted clientrace logs to do this?  Are clients
on different hosts -- i.e. the hdfs clients and hbase clients?  If so
that'd make it easy enough.  Otherwise, it'd be a little difficult.
There is probably an easier way but one (awkward) means of calculating
would be by writing a mapreduce job that took clienttrace messages and
al blocks in the filesystem and then had it sort the clienttrace
messages that belong to the ${HBASE_ROOTDIR} subdirectory.

>        One problem I met now is to decide from the regionserver log the
> quantity of data been read/written by Hbase, should I count the lengths in
> following log records as lengths of data been read/written?:
>
> org.apache.hadoop.hbase.regionserver.Store: loaded
> /user/ccenterq/hbase/hbt2table2/165204266/queries/1091785486701083780,
> isReference=false,
> sequence id=1526201715, length=*72426373*, majorCompaction=true
> 2010-03-04 01:11:54,262 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Started memstore flush for region table_word_in_doc, resort
> all-2010/01/01,1267629092479. Current region memstore size *40.5m*
>
>        here I am not sure the *72426373/40.5m is the length (in byte) of
> data read by HBase. *

Thats just file size.  Above we opened a storefile and we just logged its size.

We don't log how much we've read/written any where in hbase logs.

St.Ack

Mime
View raw message