celix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriele Ricciardi (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CELIX-375) Topology manager deadlocks when interacts with dependency manager
Date Mon, 26 Sep 2016 12:09:20 GMT

     [ https://issues.apache.org/jira/browse/CELIX-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Gabriele Ricciardi resolved CELIX-375.
    Resolution: Fixed

Used recursive locks for rsaListLock and importedServicesLock in Topology Manager.
Valgrind/Helgrind/AddressSanitizer analysis show no race conditions or memory inconsistencies
when using recursive locking.
Unit tests run all correctly.

> Topology manager deadlocks when interacts with dependency manager
> -----------------------------------------------------------------
>                 Key: CELIX-375
>                 URL: https://issues.apache.org/jira/browse/CELIX-375
>             Project: Celix
>          Issue Type: Bug
>          Components: Remote Service Admin
>            Reporter: Gabriele Ricciardi
> When interacting with the Dependency Manager, the Topology Manager deadlocks whenever
a required dependency is remotely satisfied (i.e. a remote Celix instance exports a service
able to satisfy the required dependency). The issue is systematic.
> Target configuration includes Dependency Manager, Topology Manager, RSA and Discovery
> How to reproduce it:
> -  Startup a framework (F1) with a component C1 depending on service S1 and exporting
a service S2. From DM command, you can see that the component is not started since it misses
an S1.
> - Startup the framework (F2) with a component C2 exporting the service S1. It starts
up fine, and in F1 the component C1 is started since the dependency is satisfied. But from
this point, F1 is in deadlock: no more services are detected, and framework_stop gets stuck.
> Explanation:
> - As soon as the component C2 in F2 starts, a new endpoint is created and exposed in
> - F1 detects this service and correctly imports it. To do so, the topologyManager_addImportedService
locks the rsaListLock and starts importing the detected service to the RSAs.
> - rsa_importService triggers the service registry, that triggers the service_tracker
registered by the DM.
> - Dependency Manager recognize that the dependency required by C1 is satisfied, so it
starts it.
> - While starting C1, all of its services have to be exported. In this case, C1 provides
the S2 service, so it has to be exported.
> - Export of S2 triggers the TopologyManager, that calls the topologyManager_addExportedService.
> - topologyManager_addExportedService locks the rsaListLock to access the RSAs list.
> - Since all of this happens in the same thread (DM is linked as a library and doesn’t
have its own thread), you can easily see how the topologyManager_addExportedService gets stuck
on the rsaListLock, and the topologyManager_addImportedService cannot complete (and release
the lock) until all the stacked calls return.
> Declaring the dependency from S1 optional mitigates the issue, since when F1 starts up
the C1 component doesn’t have to wait for any dependency, so it’s free to export its S2
BEFORE the remote S1 service is imported. This anyway doesn't solve the problem.
> A solution would be using a recursive mutex for rsaListLock. Recursive locks allow the
same thread to lock multiple times the same mutex, but they prevents it when it’s done by
another thread. In other words, the recursive mutex behaves like a normal mutex when accessed
by different threads and like an “always-open” lock when accessed by the same thread.
> In principle this solution should be also data-safe, since the rsaList won’t be altered
by anyone except the thread that holds the lock, and in this specific case the rsaList is
accessed only in read mode.

This message was sent by Atlassian JIRA

View raw message