hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Harvey <dan.har...@mendeley.com>
Subject Re: Rolling out Hadoop/HBase updates
Date Sun, 04 Jul 2010 17:12:20 GMT
Hey,

We're using stock CHD2 without any patches so I'm not sure if we have
hdfs630 or not. For HBase we're currently on 0.20.3 and will be testing and
moving to 0.20.5 soon

What I did with this rollout of just config changes was take one region
server down at a time and restart the datanode on the same server. So what I
gather I should have done was shutdown all the region servers before
restarting any of the data nodes?

I guess if I split it into different parts it would be :-

- HBase Rolling update for point/config releases is supported
  - Update masters first
  - Then update region servers in turn

- HDFS Data nodes don't support rolling updates? (Maybe better in the hdfs
list I guess)
  - Take down HBase
  - Take down datanodes
  - Update all the datanodes code/configs
  - Start datanodes
  - Start HBase

Would you be able to let me know which of these I've got right/wrong?

Thanks,

On 29 June 2010 15:50, Michael Segel <michael_segel@hotmail.com> wrote:

>
> Dan,
>
> I don't think you can do that because your 'new/updated' node will clash
> with the rest of the cloud.
> (We're talking code and not just cloud tuning parameters.) [Read different
> jars...]
>
> If you're going to push an update out, then it has to be an 'all or
> nothing' push.
>
> Since we're using Cloudera's release, moving from CDH2 to CDH3 represents a
> full backup, down the cloud, remove the software completely, and then then
> install new CDH3. Outside of that major switch, if we were going from one
> sub release to another, it would be just a $> yum update hadoop-0.20 call on
> each node.
> Again, you have to take the cloud down to do that.
>
> So the bottom line... if you're going to do upgrades, you'll need to plan
> for some down time.
>
> HTH
>
> -Mike
>
> > From: dan.harvey@mendeley.com
> > Date: Tue, 29 Jun 2010 14:43:26 +0100
> > Subject: Rolling out Hadoop/HBase updates
> > To: user@hbase.apache.org
> >
> > Hey,
> >
> > I've been thinking about how we do out configuration and code updates for
> > Hadoop and HBase and was wondering what others do and what is the best
> > practice to avoid errors with HBase.
> >
> > Currently we do a rolling update where we restart the services on one
> node
> > at a time, so shutting down the region server then restarting the
> datanode
> > and task trackers depending on what we are updating and what has change.
> But
> > with this I have occasional found errors with the HBase cluster
> afterwards
> > due to corrupt META table which I think could have been caused by
> restarting
> > the datanode, or maybe not waiting long enough for the cluster to sort
> out
> > loosing a region server before moving on to the next.
> >
> > The most resent error upon restarting a node was :-
> >
> > 2010-06-29 10:46:44,970 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Error closing
> > files,3822b1ea8ae015f3ec932cafaa282dd211d768ad,1275145898366
> > java.io.IOException: Filesystem closed
> >         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:230)
> >
> > 2010-06-29 10:46:44,970 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down
> > HRegionServer: file system not available
> > java.io.IOException: File system is not available
> >         at
> >
> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:129)
> >
> >
> > Followed by this for every region being served :-
> >
> > 2010-06-29 10:46:44,996 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Error closing
> > documents,082595c0-6d01-11df-936c-0026b95e484c,1275676410202
> > java.io.IOException: Filesystem closed
> >         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:230)
> >
> >
> > After updating all the nodes all the region server shut down after a
> > few minutes reporting the following :-
> >
> > 2010-06-29 11:21:59,508 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > Recovery for block blk_-1437671530216085093_2565663 bad datanode[0]
> > 10.0.11.4:50010
> >
> > 2010-06-29 11:22:09,481 FATAL org.apache.hadoop.hbase.regionserver.HLog:
> > Could not append. Requesting close of hlog
> > java.io.IOException: All datanodes 10.0.11.4:50010 are bad. Aborting...
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2542)
> >
> >
> > 2010-06-29 11:22:09,482 FATAL
> > org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with
> > ioe:
> > java.io.IOException: All datanodes 10.0.11.4:50010 are bad. Aborting...
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2542)
> >
> > 2010-06-29 11:22:10,344 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log
> in
> > abort
> > java.io.IOException: All datanodes 10.0.11.4:50010 are bad. Aborting...
> >         at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2542)
> >
> >
> > This was fixed by restarting the master and starting the region servers
> > again, but it would be nice to know how to roll out changes cleaner.
> >
> > How do other people here roll out updates to HBase / Hadoop? What order
> do
> > you restart services in and how long do you wait before moving to the
> next
> > node?
> >
> > Just so you know we currently have 5 nodes and are getting another 10 to
> add
> > soon.
> >
> > Thanks,
> >
> > --
> > Dan Harvey | Datamining Engineer
> > www.mendeley.com/profiles/dan-harvey
> >
> > Mendeley Limited | London, UK | www.mendeley.com
> > Registered in England and Wales | Company Number 6419015
>
> _________________________________________________________________
> Hotmail has tools for the New Busy. Search, chat and e-mail from your
> inbox.
>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
>



-- 
Dan Harvey | Datamining Engineer
www.mendeley.com/profiles/dan-harvey

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message