cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carlos Reátegui <create...@gmail.com>
Subject Re: Change IP subnet of cluster
Date Fri, 25 Jul 2014 15:46:13 GMT
Any thoughts out there?

It keeps trying to connect to the hosts but it is unable to and there are no clues in the
logs as to why.  I am successfully connected with XenCenter to the pool and also am able to
ssh to all the hosts from the MS.

What does “Disable Cluster” or “Unmanage Cluster” do?  Should I try that and re-enable/manage?

From the UI, things appear ok but starting any instance fails.

Thanks,
Carlos

Log snippet form this am:

2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null)
Seq 2-65931749: Forwarding null to 159090355471823
2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-13:null)
Seq 2-65931749: Routing from 159090355471825
2014-07-25 21:03:19,599 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-13:null)
Seq 2-65931749: Link is closed
2014-07-25 21:03:19,600 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-13:null)
Seq 2-65931749: MgmtId 159090355471825: Req: Resource [Host:2] is
 unreachable: Host 2: Link is closed
2014-07-25 21:03:19,600 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-13:null)
Seq 2--1: MgmtId 159090355471825: Req: Routing to peer
2014-07-25 21:03:19,601 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-14:null)
Seq 2--1: MgmtId 159090355471825: Req: Cancel request received
2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-14:null)
Seq 2-65931749: Cancelling.
2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749:
Waiting some more time because this is the current command
2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749:
Waiting some more time because this is the current command
2014-07-25 21:03:19,601 INFO  [utils.exception.CSExceptionErrorCode] (StatsCollector-2:null)
Could not find exception: com.cloud.exception.OperationTimedoutException in
 error code list for exceptions
2014-07-25 21:03:19,601 WARN  [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749:
Timed out on null
2014-07-25 21:03:19,601 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 2-65931749:
Cancelling.
2014-07-25 21:03:19,601 WARN  [agent.manager.AgentManagerImpl] (StatsCollector-2:null) Operation
timed out: Commands 65931749 to Host 2 timed out after 3600
2014-07-25 21:03:19,601 WARN  [cloud.resource.ResourceManagerImpl] (StatsCollector-2:null)
Unable to obtain host 2 statistics. 
2014-07-25 21:03:19,601 WARN  [cloud.server.StatsCollector] (StatsCollector-2:null) Received
invalid host stats for host: 2
2014-07-25 21:03:19,606 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null)
Seq 3-602278373: Forwarding null to 159090355471823
2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null)
Seq 3-602278373: Routing from 159090355471825
2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null)
Seq 3-602278373: Link is closed
2014-07-25 21:03:19,607 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null)
Seq 3-602278373: MgmtId 159090355471825: Req: Resource [Host:3] is unreachable: Host 3: Link
is closed
2014-07-25 21:03:19,608 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null)
Seq 3--1: MgmtId 159090355471825: Req: Routing to peer
2014-07-25 21:03:19,608 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-4:null)
Seq 3--1: MgmtId 159090355471825: Req: Cancel request received
2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-4:null) Seq
3-602278373: Cancelling.
2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373:
Waiting some more time because this is the current command
2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373:
Waiting some more time because this is the current command
2014-07-25 21:03:19,609 INFO  [utils.exception.CSExceptionErrorCode] (StatsCollector-2:null)
Could not find exception: com.cloud.exception.OperationTimedoutException in error code list
for exceptions
2014-07-25 21:03:19,609 WARN  [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373:
Timed out on null
2014-07-25 21:03:19,609 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-602278373:
Cancelling.
2014-07-25 21:03:19,609 WARN  [agent.manager.AgentManagerImpl] (StatsCollector-2:null) Operation
timed out: Commands 602278373 to Host 3 timed out after 3600
2014-07-25 21:03:19,609 WARN  [cloud.resource.ResourceManagerImpl] (StatsCollector-2:null)
Unable to obtain host 3 statistics. 
2014-07-25 21:03:19,609 WARN  [cloud.server.StatsCollector] (StatsCollector-2:null) Received
invalid host stats for host: 3
2014-07-25 21:03:19,614 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null)
Seq 5-1311574501: Forwarding null to 159090355471823
2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-1:null)
Seq 5-1311574501: Routing from 159090355471825
2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-1:null)
Seq 5-1311574501: Link is closed
2014-07-25 21:03:19,617 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-1:null)
Seq 5-1311574501: MgmtId 159090355471825: Req: Resource [Host:5] is unreachable: Host 5: Link
is closed



On Jul 24, 2014, at 10:59 PM, Carlos Reátegui <carlos@reategui.com> wrote:

> Not sure if it is related but I see 2 entries in the mshost for my same server but with
different msid.  Both show as ‘Up’.  In reading the table comments it seems the msid is
based on the MAC.  I am guessing this may be due to using a bond and that it may be have selected
a different NIC to get the bond MAC from.  Is it ok to have both of these entries?  Should
I mark the old one as Down?
> 
> Along these lines is there something similar with the hosts and that is why the MS is
having problems connecting to them, ie. the MACs don’t match?
> 
> thanks,
> Carlos
> 
> 
> On Jul 24, 2014, at 3:35 PM, Carlos Reategui <carlos@reategui.com> wrote:
> 
>> Hi All,
>> 
>> Had to move one of my clusters to a new subnet but it is not working (e.g. 192.168.1.0/24
to 10.100.1.0/24).  These are the steps I took:
>> 
>> Environment: CS 4.1.1 on Ubuntu 12.04, XenServer 6.1, Shared NFS SR.
>> 
>> 1) stopped all instances using cloudstack UI
>> 2) stop cloudstack-management service on MS
>> 3) Used XenCenter to kill the system VMs (no other instances running)
>> 4) Created backup of cloud db.
>> 5) Followed http://support.citrix.com/article/CTX123477 and successfully changed
the IP of hosts.  According to XenCenter everything is good including SR.
>> 6) Changed IP of MS
>> 7) verified communication between MS and Hosts using ssh and ping with new IPs.
>> 8) used sed to search and replace all old IPs with new IPs in cloud backup sql file
(e.g. sed -i.bak 's/192.168.1./10.100.1./g' clouddb.sql).
>> 9) visually verified all diffs in the sql file and made sure no references to 192.168
left.
>> 10) loaded up new sql
>> 11) search all files under /etc on MS for old IP. found and edited: /etc/cloudstack/management/db.properties
>> 12) start cloudstack-management service on MS
>> 
>> Unfortunately things are not working.  The MS is apparently unable to connect to
the hosts but I can not figure out why from the logs.
>> 
>> Logs here: https://www.dropbox.com/s/s5glxrbyatmsoug/management-server.log
>> 
>> Any help recovering is appreciated.  I do not want to have to re-install and create/import
template for each of the instance VHDs.
>> 
>> thank you,
>> -Carlos
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message