hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bradford Stephens <bradfordsteph...@gmail.com>
Subject Re: HBase Failing on Large Loads
Date Wed, 10 Jun 2009 21:52:27 GMT
Also, there's a slight variation: "Trying to contact region server
Some server for region joinedcontent"

"Some server"? Interesting :)

On Wed, Jun 10, 2009 at 2:50 PM, Bradford
Stephens<bradfordstephens@gmail.com> wrote:
> OK, I've tried all the optimizations you've suggested (still running
> with a M/R job). Still having problems like this:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server 192.168.18.15:60020 for region
> joinedcontent,242FEB3ED9BE0D8EF3856E9C4251464C,1244666594390, row
> '291DB5C7440B0A5BDB0C12501308C55B', but failed after 10 attempts.
> Exceptions:
> java.io.IOException: Call to /192.168.18.15:60020 failed on local
> exception: java.io.EOFException
> java.net.ConnectException: Call to /192.168.18.15:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /192.168.18.15:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /192.168.18.15:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /192.168.18.15:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /192.168.18.15:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /192.168.18.15:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /192.168.18.15:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /192.168.18.15:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
> java.net.ConnectException: Call to /192.168.18.15:60020 failed on
> connection exception: java.net.ConnectException: Connection refused
>
> On Wed, Jun 10, 2009 at 12:40 AM, stack<stack@duboce.net> wrote:
>> On Tue, Jun 9, 2009 at 11:51 AM, Bradford Stephens <
>> bradfordstephens@gmail.com> wrote:
>>
>>> I sort of need the reduce since I'm combining primary keys from a CSV
>>> file. Although I guess I could just use the combiner class... hrm.
>>>
>>> How do I decrease the batch size?
>>
>>
>>
>> Below is from hbase-default.xml:
>>
>>  <property>
>>    <name>hbase.client.write.buffer</name>
>>    <value>2097152</value>
>>    <description>Size of the write buffer in bytes. A bigger buffer takes
>> more
>>    memory -- on both the client and server side since server instantiates
>>    the passed write buffer to process it -- but reduces the number of RPC.
>>    For an estimate of server-side memory-used, evaluate
>>    hbase.client.write.buffer * hbase.regionserver.handler.count
>>    </description>
>>  </property>
>>
>>
>> You upped xceivers on your datanodes and you set your
>> dfs.datanode.socket.write.timeout = 0?
>>
>>
>>
>>> Also, I tried to make a map-only task that used ImmutableBytesWritable
>>> and BatchUpdate as the output K and V, and TableOutputFormat as the
>>> OutputFormat -- the job fails, saying that "HbaseMapWritable cannot be
>>> cast to org.apache.hadoop.hbase.io.BatchUpdate". I've checked my
>>> Mapper multiple times, it's definitely ouputting a BatchUpdate.
>>>
>>
>>
>> You are using TOF as the map output?  Paste the exception.  You could try
>> making a HTable instance in your configure call and then do
>> t.commit(BatchUpdate) in your map.  Emit nothing or something simple like an
>> integer so the counters when job is done make some kind of sense.
>>
>> Tell us something about our schema.  How many column families and columns?
>>
>> St.Ack
>>
>>
>>>
>>> On Tue, Jun 9, 2009 at 10:43 AM, stack<stack@duboce.net> wrote:
>>> > On Tue, Jun 9, 2009 at 10:13 AM, Bradford Stephens <
>>> > bradfordstephens@gmail.com> wrote:
>>> >
>>> >
>>> >> Hey rock stars,
>>> >>
>>> >
>>> >
>>> > Flattery makes us perk up for sure.
>>> >
>>> >
>>> >
>>> >>
>>> >> I'm having problems loading large amounts of data into a table (about
>>> >> 120 GB, 250million rows). My Map task runs fine, but when it comes to
>>> >> reducing, things start burning. 'top' inidcates that I only have ~
>>> >> 100M of RAM free on my datanodes, and every process starts thrashing
>>> >> ... even ssh and ping. Then I start to get errors like:
>>> >>
>>> >> "org.apache.hadoop.hbase.client.RegionOfflineException: region
>>> >> offline: joinedcontent,,1244513452487"
>>> >>
>>> >
>>> > See if said region is actually offline?  Try getting a row from it in
>>> shell.
>>> >
>>> >
>>> >
>>> >>
>>> >> and:
>>> >>
>>> >> "Task attempt_200906082135_0001_r_000002_0 failed to report status for
>>> >> 603 seconds. Killing!"
>>> >
>>> >
>>> >
>>> > Sounds like nodes are heavily loaded.. so loaded either the task can't
>>> > report in... or its stuck on an hbase update so long, its taking ten
>>> minutes
>>> > or more to return.
>>> >
>>> > One thing to look at is disabling batching or making batches smaller.
>>> When
>>> > batch is big, can take a while under high-load for all row edits to go
>>> in.
>>> > HBase client will not return till all row commits have succeeded.
>>>  Smaller
>>> > batches will mean more likely to return and not have the task killed
>>> because
>>> > takes longer than the report period to checkin.
>>> >
>>> >
>>> > Whats your MR job like?  Your updating hbase in the reduce phase i
>>> presume
>>> > (TableOutputFormat?).  Do you need the reduce?  Can you update hbase in
>>> the
>>> > map step?   Saves on the sort the MR framework is doing -- a sort that
is
>>> > unnecessary given as hbase orders on insertion.
>>> >
>>> >
>>> > Can you try with a lighter load?  Maybe a couple of smaller MR jobs
>>> rather
>>> > than one big one?
>>> >
>>> > St.Ack
>>> >
>>> >
>>> >>
>>> >>
>>> >> I'm running Hadoop .19.1 and HBase .19.3, with 1 master/name node and
>>> >> 8 regionservers. 2 x Dual Core Intel 3.2 GHz procs, 4 GB of RAM. 16
>>> >> map tasks, 8 reducers. I've set the MAX_HEAP in hadoop-env to 768, and
>>> >> the one in hbase-env is at its default with 1000. I've also done all
>>> >> the performance enchancements in the Wiki with the file handlers, the
>>> >> garbage collection, and the epoll limits.
>>> >>
>>> >> What am I missing? :)
>>> >>
>>> >> Cheers,
>>> >> Bradford
>>> >>
>>> >
>>>
>>
>

Mime
View raw message