qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Rudyy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (QPID-6222) [Java Broker] Replica can go into a loop if it runs out of disk space
Date Tue, 11 Nov 2014 10:17:34 GMT

     [ https://issues.apache.org/jira/browse/QPID-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alex Rudyy updated QPID-6222:
-----------------------------
    Description: 
When a node is the replica role, if it runs out of disk space, the underlying JE environment
transitions into DETACH state, but then the group change thread went in into a loop repeatedly
failing on the getGroup call.  JE reports this exception to the application via its ExceptionListener
but it seems Qpid at the moment fails to act on this information correctly.

{noformat}
2014-11-07 05:43:55,534 ERROR [DETACHED mynode2a(4)] (replication.ReplicatedEnvironmentFacade)
- Asynchronous exception thrown by BDB thread 'DETACHED mynode2a(4)'
com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.LogWriteException:
Environment invalid because of previous exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException:
No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment
is invalid and must be closed. Problem seen replaying entry INS_LN_TX/8 vlsn=4,727,877 isReplicated="1"
 txn=-1576661
        at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
        at com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:2853)
        at com.sleepycat.je.Cursor.checkEnv(Cursor.java:4181)
        at com.sleepycat.je.Cursor.close(Cursor.java:517)
        at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:890)
        at com.sleepycat.je.rep.impl.node.Replay.replayEntry(Replay.java:557)
        at com.sleepycat.je.rep.impl.node.Replica$ReplayThread.run(Replica.java:945)
Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception:
(JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE:
IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
        at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:370)
        at com.sleepycat.je.log.LogBufferPool.bumpAndWriteSynced(LogBufferPool.java:339)
        at com.sleepycat.je.log.LogBufferPool.getWriteBuffer(LogBufferPool.java:269)
        at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:706)
        at com.sleepycat.je.log.LogManager.serialLogInternal(LogManager.java:538)
        at com.sleepycat.je.log.SyncedLogManager.serialLog(SyncedLogManager.java:46)
        at com.sleepycat.je.log.LogManager.logItems(LogManager.java:485)
        at com.sleepycat.je.log.LogManager.multiLog(LogManager.java:399)
        at com.sleepycat.je.log.LogManager.log(LogManager.java:339)
        at com.sleepycat.je.tree.LN.logInternal(LN.java:553)
        at com.sleepycat.je.tree.LN.optionalLog(LN.java:372)
        at com.sleepycat.je.dbi.CursorImpl.insertNewSlot(CursorImpl.java:989)
        at com.sleepycat.je.dbi.CursorImpl.putInternal(CursorImpl.java:880)
        at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:816)
        at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:2397)
        at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2352)
        at com.sleepycat.je.Cursor.putNotify(Cursor.java:2260)
        at com.sleepycat.je.Cursor.putForReplay(Cursor.java:2134)
        at com.sleepycat.je.DbInternal.putForReplay(DbInternal.java:138)
        at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:874)
        ... 2 more
Caused by: java.io.IOException: No space left on device
        at java.io.RandomAccessFile.writeBytes0(Native Method)
        at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
        at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
        at com.sleepycat.je.log.FileManager.writeToFile(FileManager.java:1753)
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1633)
        ... 23 more
2014-11-07 05:43:55,543 INFO  [StateChange-mygroup:mynode2a] (highavailability.role_changed)
- [mng:admin(/127.0.0.1:1310)] [grp(/mygroup)] HA-1010 : Role change reported: Node : 'mynode2a'
(127.0.0.1:5000) : from 'REPLICA' to 'DETACHED'
{noformat}
The group change learner kept repeatedly reporting an exception:
{noformat}
2014-11-07 05:44:01,457 ERROR [Group-Change-Learner:mygroup:mynode2a] (util.ServerScopedRuntimeException)
- Unexpected exception occurred in replicated environment
com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.LogWriteException:
Environment invalid because of previous exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException:
No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment
is invalid and must be closed.
        at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
        at com.sleepycat.je.Environment.checkEnv(Environment.java:2225)
        at com.sleepycat.je.rep.ReplicatedEnvironment.getGroup(ReplicatedEnvironment.java:662)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.detectGroupChangesAndNotify(ReplicatedEnvironmentFacade.java:1541)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1491)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1475)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception:
(JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE:
IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
{noformat}

The operation of the other nodes in the group was unaffected.

  was:
When a node is the replica role, if it runs out of disk space, the underlying JE environment
transitions into DETACH state, but then the group change thread went in into a loop repeatedly
failing on the getGroup call.  JE reports this exception to the application via its ExceptionListener
but it seems Qpid at the moment fails to act on this information correctly.

{noformat}
2014-11-07 05:43:55,534 ERROR [DETACHED mynode2a(4)] (replication.ReplicatedEnvironmentFacade)
- Asynchronous exception thrown by BDB thread 'DETACHED mynode2a(4)'
com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.LogWriteException:
Environment invalid because of previous exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException:
No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment
is invalid and must be closed. Problem seen replaying entry INS_LN_TX/8 vlsn=4,727,877 isReplicated="1"
 txn=-1576661
        at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
        at com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:2853)
        at com.sleepycat.je.Cursor.checkEnv(Cursor.java:4181)
        at com.sleepycat.je.Cursor.close(Cursor.java:517)
        at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:890)
        at com.sleepycat.je.rep.impl.node.Replay.replayEntry(Replay.java:557)
        at com.sleepycat.je.rep.impl.node.Replica$ReplayThread.run(Replica.java:945)
Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception:
(JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE:
IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
        at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:370)
        at com.sleepycat.je.log.LogBufferPool.bumpAndWriteSynced(LogBufferPool.java:339)
        at com.sleepycat.je.log.LogBufferPool.getWriteBuffer(LogBufferPool.java:269)
        at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:706)
        at com.sleepycat.je.log.LogManager.serialLogInternal(LogManager.java:538)
        at com.sleepycat.je.log.SyncedLogManager.serialLog(SyncedLogManager.java:46)
        at com.sleepycat.je.log.LogManager.logItems(LogManager.java:485)
        at com.sleepycat.je.log.LogManager.multiLog(LogManager.java:399)
        at com.sleepycat.je.log.LogManager.log(LogManager.java:339)
        at com.sleepycat.je.tree.LN.logInternal(LN.java:553)
        at com.sleepycat.je.tree.LN.optionalLog(LN.java:372)
        at com.sleepycat.je.dbi.CursorImpl.insertNewSlot(CursorImpl.java:989)
        at com.sleepycat.je.dbi.CursorImpl.putInternal(CursorImpl.java:880)
        at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:816)
        at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:2397)
        at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2352)
        at com.sleepycat.je.Cursor.putNotify(Cursor.java:2260)
        at com.sleepycat.je.Cursor.putForReplay(Cursor.java:2134)
        at com.sleepycat.je.DbInternal.putForReplay(DbInternal.java:138)
        at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:874)
        ... 2 more
Caused by: java.io.IOException: No space left on device
        at java.io.RandomAccessFile.writeBytes0(Native Method)
        at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
        at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
        at com.sleepycat.je.log.FileManager.writeToFile(FileManager.java:1753)
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1633)
        ... 23 more
2014-11-07 05:43:55,543 INFO  [StateChange-mygroup:mynode2a] (highavailability.role_changed)
- [mng:admin(/192.168.0.1:1310)] [grp(/mygroup)] HA-1010 : Role change reported: Node : 'mynode2a'
(192.168.0.1.175:5000) : from 'REPLICA' to 'DETACHED'
{noformat}
The group change learner kept repeatedly reporting an exception:
{noformat}
2014-11-07 05:44:01,457 ERROR [Group-Change-Learner:mygroup:mynode2a] (util.ServerScopedRuntimeException)
- Unexpected exception occurred in replicated environment
com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.LogWriteException:
Environment invalid because of previous exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException:
No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment
is invalid and must be closed.
        at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
        at com.sleepycat.je.Environment.checkEnv(Environment.java:2225)
        at com.sleepycat.je.rep.ReplicatedEnvironment.getGroup(ReplicatedEnvironment.java:662)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.detectGroupChangesAndNotify(ReplicatedEnvironmentFacade.java:1541)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1491)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1475)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception:
(JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE:
IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
{noformat}

The operation of the other nodes in the group was unaffected.


> [Java Broker] Replica can go into a loop if it runs out of disk space
> ---------------------------------------------------------------------
>
>                 Key: QPID-6222
>                 URL: https://issues.apache.org/jira/browse/QPID-6222
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: 0.30
>            Reporter: Alex Rudyy
>            Assignee: Alex Rudyy
>             Fix For: 0.31
>
>         Attachments: 0001-QPID-6222-Halt-broker-on-running-out-of-disk-space.patch
>
>
> When a node is the replica role, if it runs out of disk space, the underlying JE environment
transitions into DETACH state, but then the group change thread went in into a loop repeatedly
failing on the getGroup call.  JE reports this exception to the application via its ExceptionListener
but it seems Qpid at the moment fails to act on this information correctly.
> {noformat}
> 2014-11-07 05:43:55,534 ERROR [DETACHED mynode2a(4)] (replication.ReplicatedEnvironmentFacade)
- Asynchronous exception thrown by BDB thread 'DETACHED mynode2a(4)'
> com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by:
com.sleepycat.je.LogWriteException: Environment invalid because of previous exception: (JE
5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE: IOException
on write, log is likely incomplete. Environment is invalid and must be closed. Problem seen
replaying entry INS_LN_TX/8 vlsn=4,727,877 isReplicated="1"  txn=-1576661
>         at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
>         at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
>         at com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:2853)
>         at com.sleepycat.je.Cursor.checkEnv(Cursor.java:4181)
>         at com.sleepycat.je.Cursor.close(Cursor.java:517)
>         at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:890)
>         at com.sleepycat.je.rep.impl.node.Replay.replayEntry(Replay.java:557)
>         at com.sleepycat.je.rep.impl.node.Replica$ReplayThread.run(Replica.java:945)
> Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous
exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device
LOG_WRITE: IOException on write, log is likely incomplete. Environment is invalid and must
be closed.
>         at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
>         at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
>         at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:370)
>         at com.sleepycat.je.log.LogBufferPool.bumpAndWriteSynced(LogBufferPool.java:339)
>         at com.sleepycat.je.log.LogBufferPool.getWriteBuffer(LogBufferPool.java:269)
>         at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:706)
>         at com.sleepycat.je.log.LogManager.serialLogInternal(LogManager.java:538)
>         at com.sleepycat.je.log.SyncedLogManager.serialLog(SyncedLogManager.java:46)
>         at com.sleepycat.je.log.LogManager.logItems(LogManager.java:485)
>         at com.sleepycat.je.log.LogManager.multiLog(LogManager.java:399)
>         at com.sleepycat.je.log.LogManager.log(LogManager.java:339)
>         at com.sleepycat.je.tree.LN.logInternal(LN.java:553)
>         at com.sleepycat.je.tree.LN.optionalLog(LN.java:372)
>         at com.sleepycat.je.dbi.CursorImpl.insertNewSlot(CursorImpl.java:989)
>         at com.sleepycat.je.dbi.CursorImpl.putInternal(CursorImpl.java:880)
>         at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:816)
>         at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:2397)
>         at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2352)
>         at com.sleepycat.je.Cursor.putNotify(Cursor.java:2260)
>         at com.sleepycat.je.Cursor.putForReplay(Cursor.java:2134)
>         at com.sleepycat.je.DbInternal.putForReplay(DbInternal.java:138)
>         at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:874)
>         ... 2 more
> Caused by: java.io.IOException: No space left on device
>         at java.io.RandomAccessFile.writeBytes0(Native Method)
>         at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
>         at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
>         at com.sleepycat.je.log.FileManager.writeToFile(FileManager.java:1753)
>         at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1633)
>         ... 23 more
> 2014-11-07 05:43:55,543 INFO  [StateChange-mygroup:mynode2a] (highavailability.role_changed)
- [mng:admin(/127.0.0.1:1310)] [grp(/mygroup)] HA-1010 : Role change reported: Node : 'mynode2a'
(127.0.0.1:5000) : from 'REPLICA' to 'DETACHED'
> {noformat}
> The group change learner kept repeatedly reporting an exception:
> {noformat}
> 2014-11-07 05:44:01,457 ERROR [Group-Change-Learner:mygroup:mynode2a] (util.ServerScopedRuntimeException)
- Unexpected exception occurred in replicated environment
> com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by:
com.sleepycat.je.LogWriteException: Environment invalid because of previous exception: (JE
5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE: IOException
on write, log is likely incomplete. Environment is invalid and must be closed.
>         at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
>         at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
>         at com.sleepycat.je.Environment.checkEnv(Environment.java:2225)
>         at com.sleepycat.je.rep.ReplicatedEnvironment.getGroup(ReplicatedEnvironment.java:662)
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.detectGroupChangesAndNotify(ReplicatedEnvironmentFacade.java:1541)
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1491)
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1475)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous
exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device
LOG_WRITE: IOException on write, log is likely incomplete. Environment is invalid and must
be closed.
>         at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
>         at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
> {noformat}
> The operation of the other nodes in the group was unaffected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message