hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eran Kutner <e...@gigya.com>
Subject Re: Region server crashes when using replication
Date Tue, 22 Mar 2011 18:59:03 GMT
Thanks, J-D.

As for the first issue, why does this behavior make sense? What
happens when the connection between the two cluster fails? Will the
region servers of the primary fail as well? or at least won't be able
to start? Seems very radical.

Regarding the second issue, I didn't see anything else in the logs, it
just seemed like it decided to shutdown, but maybe I missed it. I will
try to reproduce that and let you know if I succeed.

Regarding the timeout to detect a failed server, 3 minutes sounds like
a very long time for a region server to be down. Obviously, during
that time the data owned by that server is inaccessible. Is there a
reason for this long timeout? Can it be configured?

-eran



On Tue, Mar 22, 2011 at 20:22, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>
> First issue: UnknownHostException is unforgiving, your machines need
> to be able to talk to haddop2-zk3 (is that a typo?)  and it seems that
> at least that one can't. The reason the machine dies is that we
> usually try to "fail fast" in HBase.
>
> Second issue: There's not enough information, all I see is a region
> server shutting down and the reason why is probably before that.
>
> Third issue: https://issues.apache.org/jira/browse/HBASE-3664
>
> Fourth issue: it's now 3 minutes in 0.90 for the timeout to happen.
>
> J-D
>

Mime
View raw message