hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: HBase Write to Regionservers behavior
Date Fri, 12 Jun 2009 03:47:28 GMT
once the table has split more you might look in to using

It will split up the data and only run one reduce per region so all that's 
regions rows will be sent to just one reducer
but does not help much as when the table is small and you have a lot of 
reduce task.

It has benefits while one region is done that region will likely be flushed 
as memcache gets full and has to starts flushing
So it can start compactions and splits with out having to worry about more 
data coming.
Right now all the reduce will sort the data by key so all the reduce task 
will start writing to the same regions as they go because the data is sorted 
so they start from the first of the table to the last.


"Bradford Stephens" 
<bradfordstephens@gmail.com> wrote in message 
> Hey there,
> So, I wiped my HDFS and reinstalled everything, and am running smaller
> loads... so far, so good. I've got 7 regionservers.
> My job basically takes a lot of documents and metadata with unique
> binary keys (like "055E51294F9D9CA331D968D04B72A11C"), combines them
> all in a reducer, then writes it to HBase.
> What I'm noticing is that it's writing to mostly one or two regions on
> one box at a time, even though I have 7 reducers running. Monitoring
> everything with dstat -v, I notice that only 2 of my servers are doing
> much. These boxes have very low CPU idling, and high disk output (a
> few GB a minute).
> Everything else has a a little bit of disk activity (maybe 500
> MB/minute), but very idle CPUs.
> Is this normal behavior? I guess as more data is loaded, more
> regionservers are split, so over time, more boxen will be loading
> data?
> Cheers,
> Bradford

View raw message