uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lou DeGenaro (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-4684) DUCC daemons log-to-file should never give up
Date Mon, 02 Nov 2015 19:25:27 GMT

    [ https://issues.apache.org/jira/browse/UIMA-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985835#comment-14985835
] 

Lou DeGenaro commented on UIMA-4684:
------------------------------------

Shown during fix testing, here's an RM log file snippet where directory is over quota.  Notice
the gap between 14:13:37 and 14:14:57.  The RM should be logging every 10 seconds.  During
this time the file system exceeded quota.

<<<<<>>>>>
02 Nov 2015 14:13:27,908  INFO RM.Scheduler- N/A schedule  ------------------------------------------------
02 Nov 2015 14:13:27,908  INFO RM.JobManagerConverter- N/A createState  Schedule sent to Orchestrator
02 Nov 2015 14:13:27,909  INFO RM.JobManagerConverter- N/A createState
Reservation 2 15GB
        Existing[1]: bluejws67-1.1^0
        Additions[0]:
        Removals[0]:

02 Nov 2015 14:13:27,917  INFO RM.ResourceManagerComponent- N/A runScheduler  -------- 2 -------
Scheduling loop returns  --------------------
02 Nov 2015 14:13:28,457  INFO RM.ResourceManagerComponent- N/A NodeStability  Initial node
stability reached: scheduler started.
02 Nov 2015 14:13:37,903  INFO RM.ResourceManagerComponent- N/A onJobManagerStateUpdate  ------->
OR state arrives
02 Nov 2015 14:13:37,903  INFO RM.ResourceManagerComponent- N/A runScheduler  -------- 3 -------
Entering scheduling loop --------------------
02 Nov 2015 14:13:37,903  INFO RM.Scheduler- N/A nodeArrives  Total arrivals: 13
02 Nov 2015 14:13:37,904  INFO RM.NodePool- N/A reset  Nodepool: --default-- Maxorder set
to 2
02 Nov 2015 14:13:37,904  INFO RM.Scheduler- N/A schedule  Scheduling 0  new jobs.  Existing
jobs: 1
02 Nov 2015 14:13:37,904  INFO RM.Scheduler- N/A schedule  Run scheduler 0 with top-level
nodepool --default--
02 Nov 2015 14:13:37,904  INFO RM.RmJob- 2 getPrjCap  System Cannot predict cap: init_wait
false || time_per_item 0.0
02 Nov 2015 14:13:37,904  INFO RM.RmJob- 2 initJobCap  System O 1 Base cap: 1 Expected future
cap: 2147483647 potential cap 1 actual cap 1
02 Nov 2015 14:13:37,904  INFO RM.NodepoolScheduler- N/A schedule  Machine occupancy before
schedule
02 Nov 2015 14:13:37,905  INFO RM.NodePool- N/A queryMachines  ==================================
Query Machines Nodepool: --default-- =========================
02 Nov 2015 14:13:37,906  INFO RM.NodePool- N/A queryMachines
                 Name  Blacklisted Order Active Shares Unused Shares Memory (MB) Jobs
-------------------- ------------ ----- ------------- ------------- ----------- ------ ...
         bluejws67-4        false     2             0             2       30720 <none>[2]
         bluejws67-3        false     2             0             2       30720 <none>[2]
         bluejws67-1        false     1             1             0       15360 2
         bluejws67-2        false     1             0             1       15360 <none>[1]

02 Nov 2015 14:13:37,906  INFO RM.NodePool- N/A queryMachines  ==================================
End Query Machines Nodepool: --default-- ======================
02 Nov 2015 14:13:37,906  INFO RM.NodePool- N/A reset  Nodepool: --d02 Nov 2015 14:14:57,862
 INFO RM.ResourceManagerComponent- N/A runScheduler  -------- 11 ------- Entering scheduling
loop --------------------
02 Nov 2015 14:14:57,863  INFO RM.Scheduler- N/A nodeArrives  Total arrivals: 45
02 Nov 2015 14:14:57,863  INFO RM.NodePool- N/A reset  Nodepool: --default-- Maxorder set
to 2
02 Nov 2015 14:14:57,863  INFO RM.Scheduler- N/A schedule  Scheduling 0  new jobs.  Existing
jobs: 1
<<<<< >>>>>

Here is the corresponding RM console.  Notice the console was still being written during the
time the file system quota was exceeded.

<<<<<>>>>>
02 Nov 2015 14:14:07,903  INFO RM.ResourceManagerComponent - J[N/A] T[48] runScheduler  --------
6 ------- Scheduling loop returns  --------------------
02 Nov 2015 14:14:17,848  INFO RM.ResourceManagerEventListener - J[N/A] T[28] onOrchestratorStateUpdateEvent
 Event arrives
02 Nov 2015 14:14:17,885  INFO RM.ResourceManagerComponent - J[N/A] T[28] onJobManagerStateUpdate
 -------> OR state arrives
java.io.IOException: Disk quota exceeded
        at java.io.FileOutputStream.write(FileOutputStream.java:329)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
       ...
Unable to log due to logging exception.
02 Nov 2015 14:14:17,891  INFO RM.ResourceManagerComponent - J[N/A] T[48] runScheduler  --------
7 ------- Entering scheduling loop --------------------
02 Nov 2015 14:14:17,892  INFO RM.Scheduler - J[N/A] T[48] nodeArrives  Total arrivals: 29
<<<<<>>>>>

> DUCC daemons log-to-file should never give up
> ---------------------------------------------
>
>                 Key: UIMA-4684
>                 URL: https://issues.apache.org/jira/browse/UIMA-4684
>             Project: UIMA
>          Issue Type: Bug
>          Components: DUCC
>            Reporter: Lou DeGenaro
>            Assignee: Lou DeGenaro
>             Fix For: 2.1.0-Ducc
>
>
> Problem: When the common logging code fails to log to file, for example due to a quota
violation, it sets a flag to never try logging again.  The only way to resume logging is to
recycle the daemon.
> Resolution: The logger should always attempt to log to file..never give up hope!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message