hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wellington Chevreuil <wellington.chevre...@gmail.com>
Subject Re: How does HBase deal with master switch?
Date Thu, 06 Jun 2019 14:14:56 GMT
Hey Zili,

Besides what Duo explained previously, just clarifying on some concepts to
your previous description:

1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
> regarded it as failed.
>
ZK just knows about sessions and clients, not the type of client connecting
to it. Clients open a session in ZK, then keep pinging back ZK
periodically, to keep the session alive. In the case of long full GC
pauses, the client (RS, in this case), will fail to ping back within the
required period. At this point, ZK will *expire *the session.

2) ZooKeeper launched a new RegionServer, and the new one started to serve.
>
ZK doesn't launch new RS, it doesn't know about RSes, only client sessions.
With the session expiration, Master will be notified that an RS is
potentially gone, and will start the process explained by Duo.

3) The old RegionServer finished gc and thought itself was still active and
> serving.
>
What really happens here is that once RS is back from GC, it will try ping
ZK again for that session, ZK will back it off because the session is
already expired, then RS will kill itself.





Em qui, 6 de jun de 2019 às 14:58, 张铎(Duo Zhang) <palomino219@gmail.com>
escreveu:

> Once a RS is started, it will create its wal directory and start to write
> wal into it. And if master thinks a RS is dead, it will rename the wal
> directory of the RS and call recover lease on all the wal files under the
> directory to make sure that they are all closed. So even after the RS is
> back after a long GC, before it kills itself because of the
> SessionExpiredException, it can not accept any write requests any more
> since its old wal file is closed and the wal directory is also gone so it
> can not create new wal files either.
>
> Of course, you may still read from the dead RS at this moment
> so theoretically you could read a stale data, which means HBase can not
> guarantee ‘external consistency’.
>
> Hope this solves your problem.
>
> Thanks.
>
> Zili Chen <wander4096@gmail.com> 于2019年6月6日周四 下午9:38写道:
>
> > Hi,
> >
> > Recently from the book, ZooKeeper: Distributed Process Coordination, I
> find
> > a paragraph mentions that, HBase once suffered by
> >
> > 1) RegionServer started full gc and timeout on ZooKeeper. Thus ZooKeeper
> > regarded it as failed.
> > 2) ZooKeeper launched a new RegionServer, and the new one started to
> serve.
> > 3) The old RegionServer finished gc and thought itself was still active
> and
> > serving.
> >
> > in Chapter 5 section 5.3.
> >
> > I'm interested on it and would like to know how HBase community overcame
> > this issue.
> >
> > Best,
> > tison.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message