stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akila Ravihansa Perera <>
Subject Re: [Discuss] Python Agent Improvments
Date Sat, 12 Dec 2015 04:47:12 GMT
Hi Chamila,

This might sound bit dramatic...but what if we write Cartridge Agent in Go
Lang? Given the short comings of Python we have had, I'd say it's worth a


On Sat, Dec 12, 2015 at 1:34 AM, Chamila De Alwis <> wrote:

> Hi devs,
> During the testing of the Python Cartridge Agent in the past few weeks we
> were able to identify a few points where performance and functionality
> could be improved.
> *1 - Agent's thread utilization  *
> It was observed that when the agent is kept running for long periods of
> time, the thread count spawned would go up and reach the thread limit
> giving the following error.
> Exception in thread Thread-85:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/", line 810, in __bootstrap_inner
>   File
> "/mnt/apache-stratos-python-cartridge-agent-4.1.4/modules/util/",
> line 65, in run
>     task_thread.start()
>   File "/usr/lib/python2.7/", line 745, in start
>     _start_new_thread(self.__bootstrap, ())
> error: can't start new thread
> Although I couldn't pinpoint the exact use case which results in this
> sudden spike in the thread count, it's most likely caused by the
> MessageBrokerHeartBeatChecker class inside file. This is a
> periodic job which checks for the liveliness of the connected message
> broker.
> We can get rid of this by implementing a callback method of Paho Mqtt
> Client's on_disconnect method [1].
> Furthermore, the ScheduledExecutor class in the
> file, for each invocation it spawns a new thread.
> while not self.terminated:
>     time.sleep(self.delay)
>     task_thread = Thread(target=self.task.execute_task)
>     task_thread.start()
> This is not a good practice. It should rather submit the task to a thread
> pool. Imesh has an example code of the thread pool usage using the
> pythonfutures library[2] for Python 2.7.
> *2. Decoupling the messaging implementation*
> When MQTT was chosen initially as the message broker protocol, the PCA was
> designed to only use MQTT, using the Paho MQTT library[3]. However after
> coming across several issues in MQTT, it was decided to default back to
> AMQP as the message broker protocol for Stratos Components. Except for the
> PCA, since at the time, there was no AMQP 1.0 supported Python library.
> However, it seems that Apache QPid Proton library[4] provides AMQP 1.0
> compliant clients for message broker communication.
> We can provide a common interface for messaging and provide implementation
> for MQTT and AMQP (if Apache QPid Proton proves to be helpful) protocols.
> This way we can provide the selection of the protocol to the configuration,
> similar to what we have in the Java based components such as AS, CC etc.
> Furthermore, this can greatly help if messaging has to be customized to
> cater for custom protocols such as Apache Kafka[5]. It will only be another
> implementation of the common interface.
> *3. Remove dependency on the Git Binary*
> Currently the PCA manipulates the Git based artifact repository through
> the Git binary file. This is inefficient for several reasons.
>    1. Every Git command is executed as a separate process, created using
>    os.fork(). This duplicates memory and results in needless marshalling
>    and unmarshalling of input and output among the processes.
>    2. If several commands are executed upon the Git binary at the same
>    time (ex: of a multi-tenant application), it can be a performance
>    bottleneck.
>    3. This is not platform independent.
> Therefore it will greatly help if we can go for a Git library for Python
> which doesn't depend on the Git binary. Dulwich [6] was considered in the
> past, but then releases had more features TBD. However recentl releases
> seems  to have fixed a lot of bug reports and features. It also has plans
> to be re licensed as Apache v2.0 which also would help us by making it
> shippable.
> *4. Use of the maintenance mode*
> Sajith recently started a discussion on a patching strategy for the PCA
> (Thread - [Discuss] Suggesting a patching model for PCA). Patching for
> the PCA involves two different scenarios, offline and online. If instances
> that are already online needs to be patched the current PCA or Stratos
> design doesn't allow such a window.
> Maintenance Mode can be used to signal the Autoscaler that the member has
> gone to maintenance and scaling decisions should not be taken on that
> member. While in the maintenance mode, the running PCA is gracefully
> shutdown and the patched PCA comes up. When it publishes InstanceActivated
> event, the member will again be involved in the scaling decision process.
> There are few places that needs further development for this scenario to
> work.
>    1. It seems that Stratos would mark a member as Pending Termination if
>    a maintenance mode event is received.
>    2. The PCA doesn't support graceful shutdown right now. It doesn't
>    take any inputs from outside, and the threads spawned by it are not daemon
>    threads, which results in a process that needs to be killed because the
>    threads do not terminate when the main thread goes down. It can be designed
>    that an input to a periodically checked external file or somehow OS Signals
>    can set flags on the running process to terminate.
>    3. The capability of a PCA to start on an already activated instance
>    isn't verified. Based on the status of the member that it resides, it
>    should/should not publish certain events.
> *5. Update topology model based on the message broker events*
> Currently the topology is repopulated everytime the complete topology
> event is received. This was done as a quick fix to update the topology
> model. However for large complete topology events, building event objects
> and updating contexts can be costly, and on the design front, that is not
> the intended use of the complete topology event. The events should
> dynamically update the topology status of the agent. For this to happen
> another task should first be completed.
> *6. Verify message broker event classes*
> Currently all the message broker events are deserialized in to event
> objects as per to the classes defined in modules/event package. However
> this doesn't depend or track any changes done to the Stratos Messaging
> Component, and therefore can quickly be outdated without a hint on the
> build or the tests. The events have to be verified against their
> counterparts in the messaging component.
> *7. Decouple log file publishing protocol*
> The PCA has a log publishing feature where a specified list of log files
> will be monitored and entries published to a thrift data receiver. However
> there can be situations where the server logs (ex: PCA's own log) has to be
> monitored from outside (ex: monitoring the agent.log of a Docker container
> inside Kubernetes using FluentD [7], or directly publishing agent.log to
> ELK). For those situations the log publishing feature is not flexible
> enough. If we can introduce a pluggable design for the log event publisher,
> it can solve most of these situations with low effort.
> *8. Verify Python 3 compatibility*
> Python 2.7 is said to be supported until 2020 [8], so this is not a major
> concern. However since the opinions between Python 2.7 and Python 3 can be
> political, it might be good to verify and adjust for Python 3
> compatibility. The PCA was originally written with Python 2.7 in mind,
> since it is still the most distributed version by default.
> These changes might not be immediately solved, however they can be
> critical in tuning some rough spots in the PCA implementation. WDYT? Ideas?
> [1] -
> [2] -
> [3] -
> [4] -
> [5] -
> [6] -
> [7] -
> [8] -
> Regards,
> Chamila de Alwis
> Committer and PMC Member - Apache Stratos
> Software Engineer | WSO2 | +94772207163
> Blog:

Akila Ravihansa Perera
WSO2 Inc.;


View raw message