hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 张铎(Duo Zhang) <palomino...@gmail.com>
Subject Re: How does HBase deal with master switch?
Date Fri, 07 Jun 2019 11:27:20 GMT
Yes, in production it usually happens when there is a very long GC which
causes the RS to die and all the regions have been assigned to other RSes
before the RS is back and kills itself.

Natalie Chen <nataliechen1@gmail.com> 于2019年6月7日周五 下午3:03写道:

> The case about zookeeper is well known since data is actually saved
> locally.
>
> But, I thought RS writes/reads data to /from HDFS so there’s no such
> problem as replication latency.
>
> Can we say that the only chance for getting stale data from RS is what you
> have described here and I only have to monitor RS heartbeat and control gc
> pause?
>
> Thank you.
>
>
>
> 张铎(Duo Zhang) <palomino219@gmail.com>於 2019年6月7日 週五,下午1:50寫道:
>
> > Lots of distributed databases can not guarantee external consistency.
> Even
> > for zookeeper, when you update A and then tell others to get A, the
> others
> > may get a stale value since it may read from another replica which has
> not
> > received the value yet.
> >
> > There are several ways to solve the problem in HBase, for example, record
> > the time when we successfully received the last heartbeat from zk, and if
> > it has been too long then we just throw exception to client. But this is
> > not a big deal for most use cases, as in the same session, if you
> > successfully update a value then you can see the new value when reading.
> > For the external consistency, there are also several ways to solve it.
> >
> > So take your own risk, if you think external consistency is super
> important
> > to you, then you’d better choose another db. But please consider it
> > carefully, as said above, lots of databases do not guarantee this
> either...
> >
> > Natalie Chen <nataliechen1@gmail.com>于2019年6月7日 周五11:59写道:
> >
> > > Hi,
> > >
> > > I am quite concerned about the possibility of getting stale data. I was
> > > expecting consistency in HBase while choosing HBase as our nonsql db
> > > solution.
> > >
> > > So, if consistency is not guaranteed, meaning clients expecting to see
> > > latest data but, because of long gc or whatever, got wrong data instead
> > > from a “dead” RS, even the chance is slight, I have to be able to
> detect
> > > and repair the situation or just consider looking for other more
> suitable
> > > solution.
> > >
> > > So, would you kindly confirm that HBase has this “consistency” issue?
> > >
> > > Thank you.
> > >
> > >
> > >
> > > 张铎(Duo Zhang) <palomino219@gmail.com>於 2019年6月6日 週四,下午9:58寫道:
> > >
> > > > Once a RS is started, it will create its wal directory and start to
> > write
> > > > wal into it. And if master thinks a RS is dead, it will rename the
> wal
> > > > directory of the RS and call recover lease on all the wal files under
> > the
> > > > directory to make sure that they are all closed. So even after the RS
> > is
> > > > back after a long GC, before it kills itself because of the
> > > > SessionExpiredException, it can not accept any write requests any
> more
> > > > since its old wal file is closed and the wal directory is also gone
> so
> > it
> > > > can not create new wal files either.
> > > >
> > > > Of course, you may still read from the dead RS at this moment
> > > > so theoretically you could read a stale data, which means HBase can
> not
> > > > guarantee ‘external consistency’.
> > > >
> > > > Hope this solves your problem.
> > > >
> > > > Thanks.
> > > >
> > > > Zili Chen <wander4096@gmail.com> 于2019年6月6日周四 下午9:38写道:
> > > >
> > > > > Hi,
> > > > >
> > > > > Recently from the book, ZooKeeper: Distributed Process
> Coordination,
> > I
> > > > find
> > > > > a paragraph mentions that, HBase once suffered by
> > > > >
> > > > > 1) RegionServer started full gc and timeout on ZooKeeper. Thus
> > > ZooKeeper
> > > > > regarded it as failed.
> > > > > 2) ZooKeeper launched a new RegionServer, and the new one started
> to
> > > > serve.
> > > > > 3) The old RegionServer finished gc and thought itself was still
> > active
> > > > and
> > > > > serving.
> > > > >
> > > > > in Chapter 5 section 5.3.
> > > > >
> > > > > I'm interested on it and would like to know how HBase community
> > > overcame
> > > > > this issue.
> > > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message