hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slava Gorelik" <slava.gore...@gmail.com>
Subject Re: Regionserver fails to serve region
Date Tue, 11 Nov 2008 06:21:58 GMT
Hi.DEBUG wasn't enabled , because it decrease the performance and increase
log size.
Regarding the ulimit - yes it's upped for 32K.
You remember correct - during massive load i run the balancer and from this
time everything is started to behave strange.

Currently , i can't tell you the the regions that are in the table - i
re-formatted hdfs ( this was the only way i can get my cluster back to
work).

I have 7 datatnodes , 6 of them are running region server and one is
Hmaster.

Best Regards.

On Tue, Nov 11, 2008 at 1:08 AM, stack <stack@duboce.net> wrote:

> I took a look.
>
> First, enable DEBUG.  See the hbase FAQ for how.
>
> Looking, I see that all was running fine till:
>
> 2008-11-03 14:10:08,261 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: /10.X.X.Y:60020. Already tried 0 time(s).
>
> ...in the middle of an attempt at scanning the .META. region.
>
> Looking through regionserver logs, they are all fine till about that above
> time when I start to see variations on:
>
> 2008-11-03 14:08:46,440 INFO org.apache.hadoop.dfs.DFSClient: Could not
> obtain block blk_1223341017118968735_305051 from any node:
>  java.io.IOException: No live nodes contain current block
>
> ....and
>
> 2008-11-03 14:08:43,660 INFO org.apache.hadoop.dfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.X.X.Y:50010
> 2008-11-03 14:08:43,660 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> block blk_6726606309673852040_314096
>
> Your hdfs went bad for some reason around above time.  I don't see any
> obvious explanation for why it went bad.  You were running balancer at the
> time IIRC?
>
> Could you netstat your running datanodes and see how many concurrent
> connections you had running?  Was 1024 enough?  You had configured a max of
> 1024?  I don't see the ulimit print out in these logs so presume its > 1024.
>
> How many regions do you have in your table when it starts to go wonky?  You
> have 6 datanodes running beside your 6 regionservers?
>
> St.Ack
>
>
> Slava Gorelik wrote:
>
>> Hi Michael.
>> I'm sending logs, in 2 parts (2 messages)
>> Part 1
>>
>>
>> On Tue, Nov 4, 2008 at 11:44 PM, Slava Gorelik <slava.gorelik@gmail.com<mailto:
>> slava.gorelik@gmail.com>> wrote:
>>
>>    Thank You. Now it's clear.
>>
>>
>>    On Tue, Nov 4, 2008 at 11:31 PM, stack <stack@duboce.net
>>    <mailto:stack@duboce.net>> wrote:
>>
>>        Slava Gorelik wrote:
>>
>>            One more regarding the blockCache, how changes in store
>>            files (as i
>>            understand those are MapFiles) are reflected on client
>>            side cache. If we are
>>            talking about more than one client that doing a changes ?
>>            If each client has
>>            different part of the MapFile ? or something else ?
>>
>>
>>        The block cache cache is over in the server. Its a cache for
>>        store files which never change once written.  Did I say
>>        client-side cache?  I should have been more clear.  The client
>>        in this case is the regionserver itself.   The cache is so the
>>        regionserver saves on its trips over the network visiting
>>        datanodes.
>>        St.Ack
>>
>>
>>
>>            Best Regards.
>>
>>            On Tue, Nov 4, 2008 at 11:10 PM, Slava Gorelik
>>            <slava.gorelik@gmail.com
>>            <mailto:slava.gorelik@gmail.com>>wrote:
>>
>>
>>                I can try to reproduce it again, but before this i
>>                would like to send you a
>>                logs.
>>                Best Regards.
>>
>>
>>                On Tue, Nov 4, 2008 at 10:05 PM, stack
>>                <stack@duboce.net <mailto:stack@duboce.net>> wrote:
>>
>>
>>                    Then we should try and figure if there is an issue
>>                    in the balancer, or
>>                    maybe there is something missing if we are not
>>                    doing a big upload in a
>>                    manner that balances the upload across HDFS?
>>                    St.Ack
>>
>>                    Slava Gorelik wrote:
>>
>>
>>                        Sure, i'll arrange logs tomorrow.About
>>                        balancer, to wait when the massive
>>                        work is finished is good in testing
>>                        environment but in production it's
>>                        not
>>                        relevant :-)
>>
>>                        Best Regards.
>>
>>                        On Tue, Nov 4, 2008 at 9:48 PM, stack
>>                        <stack@duboce.net <mailto:stack@duboce.net>>
>>
>>                        wrote:
>>
>>
>>
>>
>>                            Slava Gorelik wrote:
>>
>>
>>
>>
>>                                Hi.Regarding the failure of new block
>>                                creation - i failed to run hbase
>>                                till
>>                                i reformatted HDFS again.
>>
>>
>>
>>
>>
>>                            I'd be interested in the logs.
>>
>>                             I just wandering if hadoop re balancing
>>                            is necessary? Will it balance
>>
>>
>>
>>                                itself
>>                                ? As i understand hadoop balancer is
>>                                moving data between data nodes,
>>                                but
>>                                in
>>                                my case this is during massive (8
>>                                clients just adding a records - about
>>                                400
>>                                requests for all region servers - 6).
>>                                So, is it good idea to run
>>                                balancer during heavy load ?
>>
>>
>>
>>
>>
>>                            I don't have sufficient experience running
>>                            the balancer.  Perhaps wait
>>                            till
>>                            upload is done, then run it?
>>
>>                            St.Ack
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message