karaf-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Zipay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KARAF-5798) Karaf slave instance does not write pid or port file until it becomes master
Date Wed, 04 Jul 2018 20:09:00 GMT

    [ https://issues.apache.org/jira/browse/KARAF-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533032#comment-16533032

Matthew Zipay commented on KARAF-5798:

>> So the pid written by {{InstanceHelper.updateInstancePid}} should reflect the
current running master

This is problematic, I think. It would suffer from the same drawback that the pid and port
files do - the slave instance would always have karaf.pid, port _and_ instance.properties
files that contain incorrect values!

That makes administration confusing at best, because now an administrator needs to inspect
with {{ps}} and {{lsof}} to discover the correct values (and, of course, neither the status
nor stop commands work until those values are corrected).

To put this into context: I have a master/slave node pair running right now. The slave's karaf.pid
file contains 19931. That is not correct. There is no process 19931 running on that host.
The slave's port file contains 33415. That is also incorrect. There is nothing listening on
port 33415 on that host. (These are values from a previous instance that is no longer running.)
As a result, neither org.apache.karaf.main.Status nor org.apache.karaf.main.Stop work.

If I stop the master instance, then the slave acquires the lock, becomes master, and the files
get written with correct values. But I don't think this makes sense. The slave *is* a running
process. Why should its karaf.pid not reflect the correct value? Likewise, the slave *has*
a shutdown port - why should its port file not have the correct value?

IMO, all three of these files should be written before lock acquisition is even attempted,
because none of these values have anything to do with master/slave status - they are OS administration
values that are needed to manage a JVM process.


> Karaf slave instance does not write pid or port file until it becomes master
> ----------------------------------------------------------------------------
>                 Key: KARAF-5798
>                 URL: https://issues.apache.org/jira/browse/KARAF-5798
>             Project: Karaf
>          Issue Type: Bug
>          Components: karaf-boot
>    Affects Versions: 4.0.9
>            Reporter: Matthew Zipay
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
> In a Karaf master/slave environment, the slave process does not write its pid or port
file until it acquires the lock and becomes the master.
> I am running Karaf 4.0.9 (ServiceMix 7.0.1).
> Karaf is configured as master/slave using the following from system.properties. Master
and slave are on different physical nodes.
> {code:java}
> karaf.lock=true
> karaf.lock.class=org.apache.karaf.main.lock.OracleJDBCLock
> karaf.lock.level=79
> karaf.lock.delay=10000
> karaf.lock.jdbc.url=jdbc:oracle:thin:#REMOVED#
> karaf.lock.jdbc.driver=oracle.jdbc.driver.OracleDriver
> karaf.lock.jdbc.user=#REMOVED#
> karaf.lock.jdbc.password=#REMOVED#
> karaf.lock.jdbc.table=KARAF_LOCK
> karaf.lock.jdbc.clustername=karaf
> karaf.lock.jdbc.timeout=30
> karaf.lock.slave.block=false
> {code}
> Attempting to stop the slave Karaf process results in _"Can't connect to the container.
The container is not running."_ This is not true, as a simple {{ps -ef | grep karaf}} confirms
that it is in fact running. I am able to enter the Karaf shell just fine, use the web console,
> I have confirmed through multiple tests that the pid and port files don't get written
until the master lock is acquired.
> Steps:
> # With the Karaf slave node not started, note the pid and port files do not exist (or
contain outdated values from a previous process).
> # Start the Karaf slave process.
> # Note that the pid and port files have not been written.
> # Stop the master process.
> # Observe the slave process acquire the lock and become master.
> # Note that the pid and port files have now been written.

This message was sent by Atlassian JIRA

View raw message