stratos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gayan Gunarathne <gay...@wso2.com>
Subject Re: [Discuss] Python Agent Improvments
Date Sun, 13 Dec 2015 13:25:25 GMT
Hi Chamila,

On Sat, Dec 12, 2015 at 1:34 AM, Chamila De Alwis <chamilad@wso2.com> wrote:

> Hi devs,
>
> During the testing of the Python Cartridge Agent in the past few weeks we
> were able to identify a few points where performance and functionality
> could be improved.
>
> *1 - Agent's thread utilization  *
> It was observed that when the agent is kept running for long periods of
> time, the thread count spawned would go up and reach the thread limit
> giving the following error.
>
> Exception in thread Thread-85:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
>     self.run()
>   File
> "/mnt/apache-stratos-python-cartridge-agent-4.1.4/modules/util/asyncscheduledtask.py",
> line 65, in run
>     task_thread.start()
>   File "/usr/lib/python2.7/threading.py", line 745, in start
>     _start_new_thread(self.__bootstrap, ())
> error: can't start new thread
>
> Although I couldn't pinpoint the exact use case which results in this
> sudden spike in the thread count, it's most likely caused by the
> MessageBrokerHeartBeatChecker class inside subscriber.py file. This is a
> periodic job which checks for the liveliness of the connected message
> broker.
>
> We can get rid of this by implementing a callback method of Paho Mqtt
> Client's on_disconnect method [1].
>
> Furthermore, the ScheduledExecutor class in the asyncscheduledtask.py
> file, for each invocation it spawns a new thread.
>
> while not self.terminated:
>     time.sleep(self.delay)
>     task_thread = Thread(target=self.task.execute_task)
>     task_thread.start()
>
>
> This is not a good practice. It should rather submit the task to a thread
> pool. Imesh has an example code of the thread pool usage using the
> pythonfutures library[2] for Python 2.7.
>

+1

>
> *2. Decoupling the messaging implementation*
>
> When MQTT was chosen initially as the message broker protocol, the PCA was
> designed to only use MQTT, using the Paho MQTT library[3]. However after
> coming across several issues in MQTT, it was decided to default back to
> AMQP as the message broker protocol for Stratos Components. Except for the
> PCA, since at the time, there was no AMQP 1.0 supported Python library.
> However, it seems that Apache QPid Proton library[4] provides AMQP 1.0
> compliant clients for message broker communication.
>

It is good if we can introduce the AMQP also as a messaging protocol with
PCA.  This will provide flexibility for the end use to choose the messaging
protocol which he prefers. In the case of the Apache QPid we need to
recheck about the platform independence of that library.

>
> We can provide a common interface for messaging and provide implementation
> for MQTT and AMQP (if Apache QPid Proton proves to be helpful) protocols.
> This way we can provide the selection of the protocol to the configuration,
> similar to what we have in the Java based components such as AS, CC etc.
>
> Furthermore, this can greatly help if messaging has to be customized to
> cater for custom protocols such as Apache Kafka[5]. It will only be another
> implementation of the common interface.
>
> *3. Remove dependency on the Git Binary*
>
> Currently the PCA manipulates the Git based artifact repository through
> the Git binary file. This is inefficient for several reasons.
>
>    1. Every Git command is executed as a separate process, created using
>    os.fork(). This duplicates memory and results in needless marshalling
>    and unmarshalling of input and output among the processes.
>    2. If several commands are executed upon the Git binary at the same
>    time (ex: of a multi-tenant application), it can be a performance
>    bottleneck.
>    3. This is not platform independent.
>
> Therefore it will greatly help if we can go for a Git library for Python
> which doesn't depend on the Git binary. Dulwich [6] was considered in the
> past, but then releases had more features TBD. However recentl releases
> seems  to have fixed a lot of bug reports and features. It also has plans
> to be re licensed as Apache v2.0 which also would help us by making it
> shippable.
>

+1. May be we can do some research with Dulwich library and introduce this
to the PCA GIT operations.

>
> *4. Use of the maintenance mode*
>
> Sajith recently started a discussion on a patching strategy for the PCA
> (Thread - [Discuss] Suggesting a patching model for PCA). Patching for
> the PCA involves two different scenarios, offline and online. If instances
> that are already online needs to be patched the current PCA or Stratos
> design doesn't allow such a window.
>
> Maintenance Mode can be used to signal the Autoscaler that the member has
> gone to maintenance and scaling decisions should not be taken on that
> member. While in the maintenance mode, the running PCA is gracefully
> shutdown and the patched PCA comes up. When it publishes InstanceActivated
> event, the member will again be involved in the scaling decision process.
>
> There are few places that needs further development for this scenario to
> work.
>
>    1. It seems that Stratos would mark a member as Pending Termination if
>    a maintenance mode event is received.
>    2. The PCA doesn't support graceful shutdown right now. It doesn't
>    take any inputs from outside, and the threads spawned by it are not daemon
>    threads, which results in a process that needs to be killed because the
>    threads do not terminate when the main thread goes down. It can be designed
>    that an input to a periodically checked external file or somehow OS Signals
>    can set flags on the running process to terminate.
>    3. The capability of a PCA to start on an already activated instance
>    isn't verified. Based on the status of the member that it resides, it
>    should/should not publish certain events.
>
> *5. Update topology model based on the message broker events*
>
> Currently the topology is repopulated everytime the complete topology
> event is received. This was done as a quick fix to update the topology
> model. However for large complete topology events, building event objects
> and updating contexts can be costly, and on the design front, that is not
> the intended use of the complete topology event. The events should
> dynamically update the topology status of the agent. For this to happen
> another task should first be completed.
>

I am not clear about this point. Can you explain further? With the
introduction of the on demand complete topology publish feature feature, we
don't need to listen for the complete topology event always. Is n't?

>
> *6. Verify message broker event classes*
>
> Currently all the message broker events are deserialized in to event
> objects as per to the classes defined in modules/event package. However
> this doesn't depend or track any changes done to the Stratos Messaging
> Component, and therefore can quickly be outdated without a hint on the
> build or the tests. The events have to be verified against their
> counterparts in the messaging component.
>

In the case of the instance life cycle , I think important to check the all
the events that received in the CA side.

>
>

> *7. Decouple log file publishing protocol*
>
> The PCA has a log publishing feature where a specified list of log files
> will be monitored and entries published to a thrift data receiver. However
> there can be situations where the server logs (ex: PCA's own log) has to be
> monitored from outside (ex: monitoring the agent.log of a Docker container
> inside Kubernetes using FluentD [7], or directly publishing agent.log to
> ELK). For those situations the log publishing feature is not flexible
> enough. If we can introduce a pluggable design for the log event publisher,
> it can solve most of these situations with low effort.
>

In the log run , I think we may need to thing about some central logging
feature support with the PCA.


>
> *8. Verify Python 3 compatibility*
>
> Python 2.7 is said to be supported until 2020 [8], so this is not a major
> concern. However since the opinions between Python 2.7 and Python 3 can be
> political, it might be good to verify and adjust for Python 3
> compatibility. The PCA was originally written with Python 2.7 in mind,
> since it is still the most distributed version by default.
>
> These changes might not be immediately solved, however they can be
> critical in tuning some rough spots in the PCA implementation. WDYT? Ideas?
>
>
> [1] - http://www.hivemq.com/blog/mqtt-client-library-paho-python
> [2] - https://github.com/agronholm/pythonfutures/
> [3] - https://eclipse.org/paho/clients/python/
> [4] -
> https://qpid.apache.org/releases/qpid-proton-0.11.0/proton/python/book/tutorial.html
> [5] - http://kafka.apache.org/documentation.html#gettingStarted
> [6] - https://www.dulwich.io/
> [7] -
> https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch/fluentd-es-image
> [8] - https://hg.python.org/peps/rev/76d43e52d978
>
> Regards,
> Chamila de Alwis
> Committer and PMC Member - Apache Stratos
> Software Engineer | WSO2 | +94772207163
> Blog: code.chamiladealwis.com
>
>
>


-- 

Gayan Gunarathne
Technical Lead, WSO2 Inc. (http://wso2.com)
Committer & PMC Member, Apache Stratos
email : gayang@wso2.com  | mobile : +94 775030545 <%2B94%20766819985>

Mime
View raw message