fluo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] keith-turner commented on issue #1004: FLUO-1000 OracleServer race conditions
Date Thu, 01 Jan 1970 00:00:00 GMT
keith-turner commented on issue #1004: FLUO-1000 OracleServer race conditions
URL: https://github.com/apache/fluo/pull/1004#issuecomment-359848825
   Looking at the latest travis output, I am still seeing some error messages like the following.
   java.lang.IllegalStateException: instance must be started before calling this method
   	at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
   	at org.apache.curator.framework.imps.CuratorFrameworkImpl.getData(CuratorFrameworkImpl.java:363)
   	at org.apache.fluo.core.oracle.OracleServer.takeLeadership(OracleServer.java:426)
   	at org.apache.curator.framework.recipes.leader.LeaderSelector$WrappedListener.takeLeadership(LeaderSelector.java:536)
   	at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:399)
   	at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:443)
   	at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64)
   	at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245)
   	at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   I suspect this is happening because the CuratorFramework was stopped, however I am not
sure.  I opened [CURATOR-448](https://issues.apache.org/jira/browse/CURATOR-448) because looking
into this  I found the error message confusing.  The error message leads one to believe that
curator was not started yet, however I think you could see the error message when it was stopped.
   Looking at Fluo's code it closes the leaderSelector before closing the curatorFramework.
 I looked at the implementation of the leaderSelector close method and it does not wait for
thread it created to terminate.  So its possible that when leaderSelector is closed and then
the curatorFramework is closed that the thread created by the leaderSelector is till running.
   It would be good to verify that the state is STOPPED when we see this error message.  If
it is I think one possible approach is to do something like the following in the takeLeadership
method.  However I am not sure how to have  strong check to ensure the exception came from
curator because of the wrong state.
     public void takeLeadership(CuratorFramework curatorFramework) throws Exception {
       try {
       } catch (IllegalStateException e) {
         //TODO how can we verify this exception came from Curator????  Don't want to suppress
other illegal state exceptions.
         if(curatorFramework.getState() == STOPPED) {
           log.debug(...);  log a debug message that this happened
         } else {
           throw e;
       } finally {
         isLeader = false;
         if (started) {
           // if we stopped the server manually, we shouldn't halt
           Halt.halt("Oracle has lost leadership unexpectedly and is now halting.");

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message