fluo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] keith-turner commented on issue #992: Notifications never processed while running stress test
Date Thu, 01 Jan 1970 00:00:00 GMT
keith-turner commented on issue #992: Notifications never processed while running stress test
URL: https://github.com/apache/fluo/issues/992#issuecomment-355066495
 
 
   I was able to confirm that NotificationTracker.requeue() not being synchronized while accessing
a hashmap caused the problem.  
   
   I suspected when `contains()` is called on the hashmap outside of sync that it could return
false even though the map does contain it.  This could happen if another thread is rehashing
the map.  I wrote a stand alone test with two threads and a hashmap to confirm this.  One
thread would constantly call `contains()` for a key known to be in the map while another thread
was constantly inserting data.  Sometimes the `contains()` call would return false even though
the map contained the key.
   
   I patched the code in the following way and ran the stress test until the bug happened
again.
   
   ```patch
   diff --git a/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java
b/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java
   index 9466933..29c42d6 100644
   --- a/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java
   +++ b/modules/core/src/main/java/org/apache/fluo/core/worker/NotificationProcessor.java
   @@ -126,6 +126,7 @@ public class NotificationProcessor implements AutoCloseable {
    
        public boolean requeue(RowColumn rowCol, FutureTask<?> ft) {
          if (!queuedWork.containsKey(rowCol)) {
   +        log.debug("queuedWork did not contain " + rowCol + " not requeuing");
            return false;
          }
   ```
   
   I enabled debug logging and the ran the stress test over and over until the bug happened
again.  After it got stuck I saw the following the logs.
   
   ```
   $ grep NotificationProcessor *.log
   worker1.log:2018-01-03 15:44:37,835 [worker.NotificationProcessor] DEBUG: queuedWork did
not contain 07:5d43:08:000000000cc4be00 count wait  not requeuing
   worker2.log:2018-01-03 15:44:05,432 [worker.NotificationProcessor] DEBUG: queuedWork did
not contain 07:flrf:08:0000000024477200 count wait  not requeuing
   ```
   
   The two notifications above are present in the table, but never being processing by the
workers.
   
   ```
   $ accumulo shell -u root -p secret -e 'compact -t stresso -w'
   $ fluo scan -a stresso --raw -c ntfy
   07:5d43:08:000000000cc4be00 ntfy:count:wait [] 822968-INSERT	
   07:flrf:08:0000000024477200 ntfy:count:wait [] 8796731-INSERT	
   ```
   
   I confirmed these notifications were present in the workers processes `queuedWork` hashmaps
by taking heap dumps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message