On Mon, Dec 7, 2015 at 4:09 PM, Akila Ravihansa Perera <ravihansa@wso2.com> wrote:
Hi,

I've fixed the warning message with commit [1]. The root cause is CC will throw an exception if the member instance id is null in the member termination flow. But the lock is acquired after the exception is thrown. Therefore it will never reach to the point where lock is acquired. I've fixed it by moving locking/topology reading related code to a separate try-finally block.

Note that this will only fix the System warning message. I'm currently working on a fix for member termination issue. Will start a separate thread for that.


@Sajith: you wouldn't  experience this in mock iaas since members become initialized within milliseconds. But that's not the case with EC2/OpenStack etc.


We may need to introduce a configurable "delay" (Thread sleep) to mock iaas operations for make it more realistic with actual iaases instance creation process.
 

Thanks.

On Mon, Dec 7, 2015 at 3:30 PM, Sajith Kariyawasam <sajith@wso2.com> wrote:
Did anyone of you able to reproduce this? I was trying with mock iaas, but I didn't encounter this error

On Sun, Dec 6, 2015 at 9:02 PM, Gayan Gunarathne <gayang@wso2.com> wrote:


On Sun, Dec 6, 2015 at 10:47 AM, Isuru Haththotuwa <isuruh@apache.org> wrote:
Hi Akila,

On Sun, Dec 6, 2015 at 10:26 AM, Akila Ravihansa Perera <ravihansa@wso2.com> wrote:
Hi Isuru,

While I agree that it is hard to handle scenarios like this in Stratos given the current architecture and design, I believe pitfalls like this could end up being a huge overhead for its users. Not only they would have to maintain a PaaS but they will also have to monitor the logs or IaaS level dashboard and manually kill instances whenever Stratos fails to do so? Perhaps we need to rethink on the whole architecture?
IMHO we need to consider the probability of this happening; for an example, in this case, whether users will try to deploy an application and undeploy it again right at the next moment. Even in such cases, if we leave a more meaningful log it should be enough, and Imesh mentioned..

Event it is instance id is blank, this seems to be a issue with acquiring and releasing the locks in our thread model.IMO we need to handle that. I think we need to first identify which thread is try to release the write lock which acquired by another thread.
I think we may able to reproduce this in the mock iaas by setting the instance id to blank.Also if we print the get context class loader in releaseWriteLock as a debug log I think we can get the exact thread which causing this issue. Will check on those points. 

@Akila , Did you able to reproduce this regularly?

As a short term solutions, perhaps we could wait for a certain amount of time until the member is initialized in the member termination flow.

+1 to go for 4.1.5-rc3. 

Thanks.
+1 for 4.1.5-RC3.

On Sun, Dec 6, 2015 at 9:45 AM, Imesh Gunaratne <imesh@apache.org> wrote:
Yes I agree with Isuru, however we should be able to raise a more meaningful error message in a such situation. If an instance has not initialized at the time the termination call is made, we should be able to tell that to the end user clearly.

[2015-12-06 00:29:51,337]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] topology-manager [thread-id] 214 [thread-name] http-nio-9443-exec-17

Regarding the above warning message, we added it purposely to track situations where threads try to release locks while they have not been acquired by the same thread. If this happens there is a slight possibility to some functionality to not work properly.

If we are to list down the issues we identified in this release candidate:
  • SNAPSHOT versions available in docker files
  • Thrift client configuration file not being up to date in load balancer extensions
  • CEP extension distribution issue
  • A validation to handle member termination logic when the given member has not initiatlized
  • doap_Stratos.rdf file was not up to date with release versions
Considering all of the above +1 to cancel this vote and go for 4.1.5-rc3.

Thanks

On Sun, Dec 6, 2015 at 9:16 AM, Isuru Haththotuwa <isuruh@apache.org> wrote:
On Sun, Dec 6, 2015 at 12:37 AM, Akila Ravihansa Perera <ravihansa@wso2.com> wrote:
I tried to deploy an application on EC2 and immediately undeployed it which caused the following exception. Also the EC2 instance did not get terminated. Noticed the following warning in the log;

[2015-12-06 00:29:51,337]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] topology-manager [thread-id] 214 [thread-name] http-nio-9443-exec-17




[2015-12-06 00:29:51,309]  INFO {org.apache.stratos.autoscaler.client.AutoscalerCloudControllerClient} -  Terminating instance via cloud controller: [member] single-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain5545861a-0a1b-4532-830e-1c9beb2d8545
[2015-12-06 00:29:51,313] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance, instance id is blank: [member-id] single-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain5545861a-0a1b-4532-830e-1c9beb2d8545 , removing member from topology...
[2015-12-06 00:29:51,319]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member terminated event: [service-name] php-ec2 [cluster-id] single-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain [cluster-instance-id] single-cartridge-app-ec2-1 [member-id] single-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain5545861a-0a1b-4532-830e-1c9beb2d8545 [network-partition-id] network-partition-ec2 [partition-id] partition-1 [group-id] null
[2015-12-06 00:29:51,326]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberTerminatedMessageProcessor} -  Member terminated: [service] php-ec2 [cluster] single-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain [member] single-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain5545861a-0a1b-4532-830e-1c9beb2d8545
[2015-12-06 00:29:51,326]  WARN {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  Obsolete member has either been terminated or its obsolete time out has expired and it is removed from obsolete members list: single-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain5545861a-0a1b-4532-830e-1c9beb2d8545
[2015-12-06 00:29:51,327]  INFO {org.apache.stratos.autoscaler.status.processor.cluster.ClusterStatusTerminatedProcessor} -  Publishing Cluster terminated event for [application]: single-cartridge-app-ec2 [cluster]: single-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain
[2015-12-06 00:29:51,335]  INFO {org.apache.stratos.cloud.controller.messaging.topology.TopologyBuilder} -  Cluster Terminated adding status started for and removing the cluster instancesingle-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain
[2015-12-06 00:29:51,337]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] topology-manager [thread-id] 214 [thread-name] http-nio-9443-exec-17
[2015-12-06 00:29:51,346]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing Cluster terminated event: [application-id] single-cartridge-app-ec2 [cluster id] single-cartridge-app-ec2.my-php-app-ec2.php-ec2.domain [instance-id] single-cartridge-app-ec2-1 
[2015-12-06 00:29:51,348] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceCloudControllerExceptionException: CloudControllerServiceCloudControllerExceptionException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8660)
at org.apache.stratos.autoscaler.client.AutoscalerCloudControllerClient.terminateInstance(AutoscalerCloudControllerClient.java:203)
at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:295)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.mvel2.optimizers.impl.refl.nodes.MethodAccessor.getValue(MethodAccessor.java:48)
at org.mvel2.optimizers.impl.refl.nodes.VariableAccessor.getValue(VariableAccessor.java:37)
at org.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:108)
at org.mvel2.MVELRuntime.execute(MVELRuntime.java:85)
at org.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123)
at org.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119)
at org.mvel2.MVEL.executeExpression(MVEL.java:930)
at org.drools.base.mvel.MVELConsequence.evaluate(MVELConsequence.java:104)
at org.drools.common.DefaultAgenda.fireActivation(DefaultAgenda.java:1287)
at org.drools.common.DefaultAgenda.fireNextItem(DefaultAgenda.java:1221)
at org.drools.common.DefaultAgenda.fireAllRules(DefaultAgenda.java:1456)
at org.drools.common.AbstractWorkingMemory.fireAllRules(AbstractWorkingMemory.java:710)
at org.drools.common.AbstractWorkingMemory.fireAllRules(AbstractWorkingMemory.java:674)
at org.drools.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:230)
at org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.evaluate(ClusterMonitor.java:472)
at org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.access$200(ClusterMonitor.java:86)
at org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor$2.run(ClusterMonitor.java:444)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2015-12-06 00:29:51,356]  INFO {org.apache.stratos.autoscaler.event.receiver.topology.AutoscalerTopologyEventReceiver} -  [ClusterTerminatedEvent] Received: class org.apache.stratos.messaging.event.topology.ClusterInstanceTerminatedEvent
[2015-12-06 00:29:51,356]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending application instance terminated for [application] single-cartridge-app-ec2 [instance] single-cartridge-app-ec2-1

Looking at the logs, it seems that the instance id has been null when the CC tries to terminate the instance, in the terminateInstance method. The instance id is returned with NodeMetadata when an instance is created in EC2. Maybe the instance id is null since the termination was started at the same moment in which the call to start the instance [1] is happening. In such cases the member is removed from the Topology and therefore the only option is to manually terminate it from the IaaS. IMHO such scenarios are difficult to handle from Stratos side. If the member is correctly removed from the Topology, then it should be fine.

[1]. computeService.createNodesInGroup


On Sun, Dec 6, 2015 at 12:20 AM, Akila Ravihansa Perera <ravihansa@wso2.com> wrote:

On Sat, Dec 5, 2015 at 8:02 PM, Gayan Gunarathne <gayang@wso2.com> wrote:
Modify it as 4.1.5-rc2

Thanks,
Gayan


On Sat, Dec 5, 2015 at 12:04 PM, Akila Ravihansa Perera <ravihansa@wso2.com> wrote:
Hi Gayan,

The vote is for the tag, not the binaries. Therefore we need to tag the code in order to vote. 

Also we do not tag with a release version (4.1.5) until the vote has passed. 

Could you please fix it?

Thanks.


On Friday, 4 December 2015, Imesh Gunaratne <imesh@apache.org> wrote:
Hi Gayan,

I do not see the 4.1.5-rc2 tag, have we created it as 4.1.5?

Thanks

On Thu, Dec 3, 2015 at 10:01 AM, Gayan Gunarathne <gayang@wso2.com> wrote:
Hi All,
 
This thread is for discussion of the second release candidate for Apache Stratos 4.1.5. Please use this thread for discussion of issues uncovered in the RC, questions you may have about the RC, etc.
 
The RC release packs could be found here [1]. A git tag (4.1.5) [2] has been created for this release and its tree view could be seen here [3].
 
[1] https://dist.apache.org/repos/dist/dev/stratos/releases/4.1.5-rc2/
[2] https://git-wip-us.apache.org/repos/asf?p=stratos.git;a=commit;h=a9f1f51a9ae2829d85bf7b8f2d8fb622db991d25
[3] https://git-wip-us.apache.org/repos/asf?p=stratos.git;a=tree;h=bf33be1ad90c9bd071a5ded8dd440eb83de80ead;hb=a9f1f51a9ae2829d85bf7b8f2d8fb622db991d25
 
Thanks,
The Apache Stratos team

--

Gayan Gunarathne
Technical Lead, WSO2 Inc. (http://wso2.com)
Committer & PMC Member, Apache Stratos
email : gayang@wso2.com  | mobile : +94 775030545
 
 



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Akila Ravihansa Perera
WSO2 Inc.;  http://wso2.com/

Blog: http://ravihansa3000.blogspot.com




--

Gayan Gunarathne
Technical Lead, WSO2 Inc. (http://wso2.com)
Committer & PMC Member, Apache Stratos
email : gayang@wso2.com  | mobile : +94 775030545
 
 



--
Akila Ravihansa Perera
WSO2 Inc.;  http://wso2.com/

Blog: http://ravihansa3000.blogspot.com



--



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--



--

Gayan Gunarathne
Technical Lead, WSO2 Inc. (http://wso2.com)
Committer & PMC Member, Apache Stratos
email : gayang@wso2.com  | mobile : +94 775030545
 
 



--
Sajith Kariyawasam
Committer and PMC member, Apache Stratos, 
WSO2 Inc.; http://wso2.com
Mobile: 0772269575



--
Akila Ravihansa Perera
WSO2 Inc.;  http://wso2.com/

Blog: http://ravihansa3000.blogspot.com



--
Sajith Kariyawasam
Committer and PMC member, Apache Stratos, 
WSO2 Inc.; http://wso2.com
Mobile: 0772269575