fluo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] mikewalch commented on a change in pull request #142: Added troubleshooting documentation
Date Tue, 13 Mar 2018 21:07:39 GMT
mikewalch commented on a change in pull request #142: Added troubleshooting documentation
URL: https://github.com/apache/fluo-website/pull/142#discussion_r174284679
 
 

 ##########
 File path: _fluo-1-2/administration/troubleshooting.md
 ##########
 @@ -0,0 +1,56 @@
+---
+title: Troubleshooting
+category: administration
+order: 7
+---
+
+Steps for troubleshooting problems with Fluo applications.
+
+## Fluo application stops processing data
+
+1. Confirm that your application is running with the expected number of workers. 
+    ```bash
+    $ fluo list
+    Fluo instance (localhost/fluo) contains 1 application(s)
+
+    Application     Status     # Workers
+    -----------     ------     ---------
+    webindex        RUNNING        3
+    ```
+   Look for errors in the logs of any oracle or worker that has died.
+
+1. Run the `fluo wait` command to see if you application is processing notifications. 
+    ```bash
+    $ fluo wait -a webindex
+    [command.FluoWait] INFO : The wait command will exit when all notifications are processed
+    [command.FluoWait] INFO : 140 notifications are still outstanding.  Will try again in
10 seconds...
+    [command.FluoWait] INFO : 140 notifications are still outstanding.  Will try again in
10 seconds...
+    [command.FluoWait] INFO : 140 notifications are still outstanding.  Will try again in
10 seconds...
+    [command.FluoWait] INFO : 96 notifications are still outstanding.  Will try again in
10 seconds...
+    [command.FluoWait] INFO : 70 notifications are still outstanding.  Will try again in
10 seconds...
+    [command.FluoWait] INFO : 31 notifications are still outstanding.  Will try again in
10 seconds...
+    [command.FluoWait] INFO : All processing has finished!
+    ```
+   The number of notifications will increase as data is added to the application but they
should eventually decrease
+   to zero and processing should finish.
+
+1. Look for errors or exceptions in the logs of all oracle and worker processes. Processing
can stop if all threads
+   in a worker process were consumed by exceptions thrown in Fluo application's observer
code. These exceptions
+   are often due to parsing issues or corner cases not seen during development or using small
data sets.
+
+1. If you are using a cluster manager (i.e Marathon, YARN etc) to run your Fluo application,
look for errors in the logs of
+   your cluster manager or application manager.  Below are some common errors: 
+
+    * Cluster managers sometimes fail to start all process of Fluo application due to lack
of container slots or resources (CPU, memory, etc).
+      This can be fixed by giving more resources to your cluster manager or decrease the
number/resources of Fluo workers.
+    * Cluster managers can kill Fluo processes if they use too much memory. This can be fixed
by allocating more memory to your workers.
+
+1. Run [jstack] to get stack traces of threads in your Fluo application processes and look
for any dead locks.
 
 Review comment:
   Fixed in 27f213013452

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message