celix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriele Ricciardi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CELIX-375) Topology manager deadlocks when interacts with dependency manager
Date Mon, 26 Sep 2016 12:06:20 GMT
Gabriele Ricciardi created CELIX-375:

             Summary: Topology manager deadlocks when interacts with dependency manager
                 Key: CELIX-375
                 URL: https://issues.apache.org/jira/browse/CELIX-375
             Project: Celix
          Issue Type: Bug
          Components: Remote Service Admin
            Reporter: Gabriele Ricciardi

When interacting with the Dependency Manager, the Topology Manager deadlocks whenever a required
dependency is remotely satisfied (i.e. a remote Celix instance exports a service able to satisfy
the required dependency). The issue is systematic.

Target configuration includes Dependency Manager, Topology Manager, RSA and Discovery ETCD.

How to reproduce it:

-  Startup a framework (F1) with a component C1 depending on service S1 and exporting a service
S2. From DM command, you can see that the component is not started since it misses an S1.
- Startup the framework (F2) with a component C2 exporting the service S1. It starts up fine,
and in F1 the component C1 is started since the dependency is satisfied. But from this point,
F1 is in deadlock: no more services are detected, and framework_stop gets stuck.

- As soon as the component C2 in F2 starts, a new endpoint is created and exposed in ETCD.
- F1 detects this service and correctly imports it. To do so, the topologyManager_addImportedService
locks the rsaListLock and starts importing the detected service to the RSAs.
- rsa_importService triggers the service registry, that triggers the service_tracker registered
by the DM.
- Dependency Manager recognize that the dependency required by C1 is satisfied, so it starts
- While starting C1, all of its services have to be exported. In this case, C1 provides the
S2 service, so it has to be exported.
- Export of S2 triggers the TopologyManager, that calls the topologyManager_addExportedService.
- topologyManager_addExportedService locks the rsaListLock to access the RSAs list.
- Since all of this happens in the same thread (DM is linked as a library and doesn’t have
its own thread), you can easily see how the topologyManager_addExportedService gets stuck
on the rsaListLock, and the topologyManager_addImportedService cannot complete (and release
the lock) until all the stacked calls return.

Declaring the dependency from S1 optional mitigates the issue, since when F1 starts up the
C1 component doesn’t have to wait for any dependency, so it’s free to export its S2 BEFORE
the remote S1 service is imported. This anyway doesn't solve the problem.

A solution would be using a recursive mutex for rsaListLock. Recursive locks allow the same
thread to lock multiple times the same mutex, but they prevents it when it’s done by another
thread. In other words, the recursive mutex behaves like a normal mutex when accessed by different
threads and like an “always-open” lock when accessed by the same thread.
In principle this solution should be also data-safe, since the rsaList won’t be altered
by anyone except the thread that holds the lock, and in this specific case the rsaList is
accessed only in read mode.

This message was sent by Atlassian JIRA

View raw message