hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: HBase Failing on Large Loads
Date Wed, 10 Jun 2009 06:15:18 GMT
I thank most of your problems are coming from running to many map/reduce 
task at the same time with so
little memory and swapping and regionserver/datanodes/tasktrackers do not 
have time to check in to tell there masters that there alive still and stuff 
starts failing.

I would try 2 maps 2 reduce per machine maybe 4 with that little memory.
I run 3 mappers and 2 reducers per server with 4gb memory with 1gb heap for
hbase/datanode/tasktracker and 400mb for task.


"Bradford Stephens" 
<bradfordstephens@gmail.com> wrote in message 
I ran some more tests to clarify my questions from above. After the
same MR job, 5 out of 8 of my Regionservers died before I terminated
the job.  Here's what I saw in one of the HBase Regionserver logs...

Exception in createBlockOutputStream java.io.IOException: Bad connect
ack with firstBadLink  (with many different

Then I get errors like this:

Error Recovery for block blk_-4108085472136309132_97478 in pipeline,, bad

then things continue for a while and I get this:

Exception while reading from blk_1698571189906026963_93533 of
from java.io.IOException: Premeture EOF from

Then I start seeing stuff like this:

Error Recovery for block blk_3202913437369696154_99607 bad datanode[0]
nodes == null
2009-06-09 16:31:15,330 WARN org.apache.hadoop.hdfs.DFSClient: Could
not get block locations. Source file
- Aborting...

Exception in createBlockOutputStream java.io.IOException: Could not
read from stream

Abandoning block blk_-4592653855912358506_99607

And this...
DataStreamer Exception: java.io.IOException: Unable to create new block.

Then it eventually dies.

On Tue, Jun 9, 2009 at 11:51 AM, Bradford
Stephens<bradfordstephens@gmail.com> wrote:
> I sort of need the reduce since I'm combining primary keys from a CSV
> file. Although I guess I could just use the combiner class... hrm.
> How do I decrease the batch size?
> Also, I tried to make a map-only task that used ImmutableBytesWritable
> and BatchUpdate as the output K and V, and TableOutputFormat as the
> OutputFormat -- the job fails, saying that "HbaseMapWritable cannot be
> cast to org.apache.hadoop.hbase.io.BatchUpdate". I've checked my
> Mapper multiple times, it's definitely ouputting a BatchUpdate.
> On Tue, Jun 9, 2009 at 10:43 AM, 
> stack<stack@duboce.net> wrote:
>> On Tue, Jun 9, 2009 at 10:13 AM, Bradford Stephens <
>> bradfordstephens@gmail.com> wrote:
>>> Hey rock stars,
>> Flattery makes us perk up for sure.
>>> I'm having problems loading large amounts of data into a table (about
>>> 120 GB, 250million rows). My Map task runs fine, but when it comes to
>>> reducing, things start burning. 'top' inidcates that I only have ~
>>> 100M of RAM free on my datanodes, and every process starts thrashing
>>> ... even ssh and ping. Then I start to get errors like:
>>> "org.apache.hadoop.hbase.client.RegionOfflineException: region
>>> offline: joinedcontent,,1244513452487"
>> See if said region is actually offline? Try getting a row from it in 
>> shell.
>>> and:
>>> "Task attempt_200906082135_0001_r_000002_0 failed to report status for
>>> 603 seconds. Killing!"
>> Sounds like nodes are heavily loaded.. so loaded either the task can't
>> report in... or its stuck on an hbase update so long, its taking ten 
>> minutes
>> or more to return.
>> One thing to look at is disabling batching or making batches smaller. 
>> When
>> batch is big, can take a while under high-load for all row edits to go 
>> in.
>> HBase client will not return till all row commits have succeeded. Smaller
>> batches will mean more likely to return and not have the task killed 
>> because
>> takes longer than the report period to checkin.
>> Whats your MR job like? Your updating hbase in the reduce phase i presume
>> (TableOutputFormat?). Do you need the reduce? Can you update hbase in the
>> map step? Saves on the sort the MR framework is doing -- a sort that is
>> unnecessary given as hbase orders on insertion.
>> Can you try with a lighter load? Maybe a couple of smaller MR jobs rather
>> than one big one?
>> St.Ack
>>> I'm running Hadoop .19.1 and HBase .19.3, with 1 master/name node and
>>> 8 regionservers. 2 x Dual Core Intel 3.2 GHz procs, 4 GB of RAM. 16
>>> map tasks, 8 reducers. I've set the MAX_HEAP in hadoop-env to 768, and
>>> the one in hbase-env is at its default with 1000. I've also done all
>>> the performance enchancements in the Wiki with the file handlers, the
>>> garbage collection, and the epoll limits.
>>> What am I missing? :)
>>> Cheers,
>>> Bradford

View raw message