hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Harvey <dan.har...@mendeley.com>
Subject Re: Rolling out Hadoop/HBase updates
Date Sun, 04 Jul 2010 17:36:32 GMT
Just looked into hdfs630 and it looks like it was added in
cdh2 0.20.1+169.89 and we're currently on  0.20.1+169.68. So would it help
prevent some of these issues by updating to that so we have the patch?

Thanks,

On 4 July 2010 18:12, Dan Harvey <dan.harvey@mendeley.com> wrote:

> Hey,
>
> We're using stock CHD2 without any patches so I'm not sure if we have
> hdfs630 or not. For HBase we're currently on 0.20.3 and will be testing and
> moving to 0.20.5 soon
>
> What I did with this rollout of just config changes was take one region
> server down at a time and restart the datanode on the same server. So what I
> gather I should have done was shutdown all the region servers before
> restarting any of the data nodes?
>
> I guess if I split it into different parts it would be :-
>
> - HBase Rolling update for point/config releases is supported
>   - Update masters first
>   - Then update region servers in turn
>
> - HDFS Data nodes don't support rolling updates? (Maybe better in the hdfs
> list I guess)
>   - Take down HBase
>   - Take down datanodes
>   - Update all the datanodes code/configs
>   - Start datanodes
>   - Start HBase
>
> Would you be able to let me know which of these I've got right/wrong?
>
> Thanks,
>
> On 29 June 2010 15:50, Michael Segel <michael_segel@hotmail.com> wrote:
>
>>
>> Dan,
>>
>> I don't think you can do that because your 'new/updated' node will clash
>> with the rest of the cloud.
>> (We're talking code and not just cloud tuning parameters.) [Read different
>> jars...]
>>
>> If you're going to push an update out, then it has to be an 'all or
>> nothing' push.
>>
>> Since we're using Cloudera's release, moving from CDH2 to CDH3 represents
>> a full backup, down the cloud, remove the software completely, and then then
>> install new CDH3. Outside of that major switch, if we were going from one
>> sub release to another, it would be just a $> yum update hadoop-0.20 call on
>> each node.
>> Again, you have to take the cloud down to do that.
>>
>> So the bottom line... if you're going to do upgrades, you'll need to plan
>> for some down time.
>>
>> HTH
>>
>> -Mike
>>
>> > From: dan.harvey@mendeley.com
>> > Date: Tue, 29 Jun 2010 14:43:26 +0100
>> > Subject: Rolling out Hadoop/HBase updates
>> > To: user@hbase.apache.org
>> >
>> > Hey,
>> >
>> > I've been thinking about how we do out configuration and code updates
>> for
>> > Hadoop and HBase and was wondering what others do and what is the best
>> > practice to avoid errors with HBase.
>> >
>> > Currently we do a rolling update where we restart the services on one
>> node
>> > at a time, so shutting down the region server then restarting the
>> datanode
>> > and task trackers depending on what we are updating and what has change.
>> But
>> > with this I have occasional found errors with the HBase cluster
>> afterwards
>> > due to corrupt META table which I think could have been caused by
>> restarting
>> > the datanode, or maybe not waiting long enough for the cluster to sort
>> out
>> > loosing a region server before moving on to the next.
>> >
>> > The most resent error upon restarting a node was :-
>> >
>> > 2010-06-29 10:46:44,970 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Error closing
>> > files,3822b1ea8ae015f3ec932cafaa282dd211d768ad,1275145898366
>> > java.io.IOException: Filesystem closed
>> >         at
>> org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:230)
>> >
>> > 2010-06-29 10:46:44,970 FATAL
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down
>> > HRegionServer: file system not available
>> > java.io.IOException: File system is not available
>> >         at
>> >
>> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:129)
>> >
>> >
>> > Followed by this for every region being served :-
>> >
>> > 2010-06-29 10:46:44,996 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Error closing
>> > documents,082595c0-6d01-11df-936c-0026b95e484c,1275676410202
>> > java.io.IOException: Filesystem closed
>> >         at
>> org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:230)
>> >
>> >
>> > After updating all the nodes all the region server shut down after a
>> > few minutes reporting the following :-
>> >
>> > 2010-06-29 11:21:59,508 WARN org.apache.hadoop.hdfs.DFSClient: Error
>> > Recovery for block blk_-1437671530216085093_2565663 bad datanode[0]
>> > 10.0.11.4:50010
>> >
>> > 2010-06-29 11:22:09,481 FATAL org.apache.hadoop.hbase.regionserver.HLog:
>> > Could not append. Requesting close of hlog
>> > java.io.IOException: All datanodes 10.0.11.4:50010 are bad. Aborting...
>> >         at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2542)
>> >
>> >
>> > 2010-06-29 11:22:09,482 FATAL
>> > org.apache.hadoop.hbase.regionserver.LogRoller: Log rolling failed with
>> > ioe:
>> > java.io.IOException: All datanodes 10.0.11.4:50010 are bad. Aborting...
>> >         at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2542)
>> >
>> > 2010-06-29 11:22:10,344 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to close log
>> in
>> > abort
>> > java.io.IOException: All datanodes 10.0.11.4:50010 are bad. Aborting...
>> >         at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2542)
>> >
>> >
>> > This was fixed by restarting the master and starting the region servers
>> > again, but it would be nice to know how to roll out changes cleaner.
>> >
>> > How do other people here roll out updates to HBase / Hadoop? What order
>> do
>> > you restart services in and how long do you wait before moving to the
>> next
>> > node?
>> >
>> > Just so you know we currently have 5 nodes and are getting another 10 to
>> add
>> > soon.
>> >
>> > Thanks,
>> >
>> > --
>> > Dan Harvey | Datamining Engineer
>> > www.mendeley.com/profiles/dan-harvey
>> >
>> > Mendeley Limited | London, UK | www.mendeley.com
>> > Registered in England and Wales | Company Number 6419015
>>
>> _________________________________________________________________
>> Hotmail has tools for the New Busy. Search, chat and e-mail from your
>> inbox.
>>
>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
>>
>
>
>
> --
> Dan Harvey | Datamining Engineer
> www.mendeley.com/profiles/dan-harvey
>
> Mendeley Limited | London, UK | www.mendeley.com
> Registered in England and Wales | Company Number 6419015
>



-- 
Dan Harvey | Datamining Engineer
www.mendeley.com/profiles/dan-harvey

Mendeley Limited | London, UK | www.mendeley.com
Registered in England and Wales | Company Number 6419015

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message