qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Rudyy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (QPID-6222) [Java Broker] Replica can go into a loop if it runs out of disk space
Date Mon, 10 Nov 2014 17:14:33 GMT

     [ https://issues.apache.org/jira/browse/QPID-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alex Rudyy updated QPID-6222:
-----------------------------
    Description: 
On running out of disk space the replica transitions into DETACH role, but then it went into
a loop.
{noformat}
2014-11-07 05:43:55,534 ERROR [DETACHED mynode2a(4)] (replication.ReplicatedEnvironmentFacade)
- Asynchronous exception thrown by BDB thread 'DETACHED mynode2a(4)'
com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.LogWriteException:
Environment invalid because of previous exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException:
No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment
is invalid and must be closed. Problem seen replaying entry INS_LN_TX/8 vlsn=4,727,877 isReplicated="1"
 txn=-1576661
        at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
        at com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:2853)
        at com.sleepycat.je.Cursor.checkEnv(Cursor.java:4181)
        at com.sleepycat.je.Cursor.close(Cursor.java:517)
        at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:890)
        at com.sleepycat.je.rep.impl.node.Replay.replayEntry(Replay.java:557)
        at com.sleepycat.je.rep.impl.node.Replica$ReplayThread.run(Replica.java:945)
Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception:
(JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE:
IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
        at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:370)
        at com.sleepycat.je.log.LogBufferPool.bumpAndWriteSynced(LogBufferPool.java:339)
        at com.sleepycat.je.log.LogBufferPool.getWriteBuffer(LogBufferPool.java:269)
        at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:706)
        at com.sleepycat.je.log.LogManager.serialLogInternal(LogManager.java:538)
        at com.sleepycat.je.log.SyncedLogManager.serialLog(SyncedLogManager.java:46)
        at com.sleepycat.je.log.LogManager.logItems(LogManager.java:485)
        at com.sleepycat.je.log.LogManager.multiLog(LogManager.java:399)
        at com.sleepycat.je.log.LogManager.log(LogManager.java:339)
        at com.sleepycat.je.tree.LN.logInternal(LN.java:553)
        at com.sleepycat.je.tree.LN.optionalLog(LN.java:372)
        at com.sleepycat.je.dbi.CursorImpl.insertNewSlot(CursorImpl.java:989)
        at com.sleepycat.je.dbi.CursorImpl.putInternal(CursorImpl.java:880)
        at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:816)
        at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:2397)
        at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2352)
        at com.sleepycat.je.Cursor.putNotify(Cursor.java:2260)
        at com.sleepycat.je.Cursor.putForReplay(Cursor.java:2134)
        at com.sleepycat.je.DbInternal.putForReplay(DbInternal.java:138)
        at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:874)
        ... 2 more
Caused by: java.io.IOException: No space left on device
        at java.io.RandomAccessFile.writeBytes0(Native Method)
        at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
        at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
        at com.sleepycat.je.log.FileManager.writeToFile(FileManager.java:1753)
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1633)
        ... 23 more
2014-11-07 05:43:55,543 INFO  [StateChange-mygroup:mynode2a] (highavailability.role_changed)
- [mng:admin(/169.82.112.40:1310)] [grp(/mygroup)] HA-1010 : Role change reported: Node :
'mynode2a' (169.113.198.175:5000) : from 'REPLICA' to 'DETACHED'
{noformat}
The group change learner kept repeatedly reporting an exception:
{noformat}
2014-11-07 05:44:01,457 ERROR [Group-Change-Learner:mygroup:mynode2a] (util.ServerScopedRuntimeException)
- Unexpected exception occurred in replicated environment
com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.LogWriteException:
Environment invalid because of previous exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException:
No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment
is invalid and must be closed.
        at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
        at com.sleepycat.je.Environment.checkEnv(Environment.java:2225)
        at com.sleepycat.je.rep.ReplicatedEnvironment.getGroup(ReplicatedEnvironment.java:662)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.detectGroupChangesAndNotify(ReplicatedEnvironmentFacade.java:1541)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1491)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1475)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception:
(JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE:
IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
{noformat}

The operation of the other nodes in the group was unaffected.

  was:
Running test HA_DISKSPACE_REPLICA,  on running out of disk space the replica which seen to
DETACH, but then it went into a loop.
{noformat}
2014-11-07 05:43:55,534 ERROR [DETACHED mynode2a(4)] (replication.ReplicatedEnvironmentFacade)
- Asynchronous exception thrown by BDB thread 'DETACHED mynode2a(4)'
com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.LogWriteException:
Environment invalid because of previous exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException:
No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment
is invalid and must be closed. Problem seen replaying entry INS_LN_TX/8 vlsn=4,727,877 isReplicated="1"
 txn=-1576661
        at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
        at com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:2853)
        at com.sleepycat.je.Cursor.checkEnv(Cursor.java:4181)
        at com.sleepycat.je.Cursor.close(Cursor.java:517)
        at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:890)
        at com.sleepycat.je.rep.impl.node.Replay.replayEntry(Replay.java:557)
        at com.sleepycat.je.rep.impl.node.Replica$ReplayThread.run(Replica.java:945)
Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception:
(JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE:
IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
        at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:370)
        at com.sleepycat.je.log.LogBufferPool.bumpAndWriteSynced(LogBufferPool.java:339)
        at com.sleepycat.je.log.LogBufferPool.getWriteBuffer(LogBufferPool.java:269)
        at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:706)
        at com.sleepycat.je.log.LogManager.serialLogInternal(LogManager.java:538)
        at com.sleepycat.je.log.SyncedLogManager.serialLog(SyncedLogManager.java:46)
        at com.sleepycat.je.log.LogManager.logItems(LogManager.java:485)
        at com.sleepycat.je.log.LogManager.multiLog(LogManager.java:399)
        at com.sleepycat.je.log.LogManager.log(LogManager.java:339)
        at com.sleepycat.je.tree.LN.logInternal(LN.java:553)
        at com.sleepycat.je.tree.LN.optionalLog(LN.java:372)
        at com.sleepycat.je.dbi.CursorImpl.insertNewSlot(CursorImpl.java:989)
        at com.sleepycat.je.dbi.CursorImpl.putInternal(CursorImpl.java:880)
        at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:816)
        at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:2397)
        at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2352)
        at com.sleepycat.je.Cursor.putNotify(Cursor.java:2260)
        at com.sleepycat.je.Cursor.putForReplay(Cursor.java:2134)
        at com.sleepycat.je.DbInternal.putForReplay(DbInternal.java:138)
        at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:874)
        ... 2 more
Caused by: java.io.IOException: No space left on device
        at java.io.RandomAccessFile.writeBytes0(Native Method)
        at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
        at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
        at com.sleepycat.je.log.FileManager.writeToFile(FileManager.java:1753)
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1633)
        ... 23 more
2014-11-07 05:43:55,543 INFO  [StateChange-mygroup:mynode2a] (highavailability.role_changed)
- [mng:admin(/169.82.112.40:1310)] [grp(/mygroup)] HA-1010 : Role change reported: Node :
'mynode2a' (169.113.198.175:5000) : from 'REPLICA' to 'DETACHED'
{noformat}
The group change learner kept repeatedly reporting an exception:
{noformat}
2014-11-07 05:44:01,457 ERROR [Group-Change-Learner:mygroup:mynode2a] (util.ServerScopedRuntimeException)
- Unexpected exception occurred in replicated environment
com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.LogWriteException:
Environment invalid because of previous exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException:
No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment
is invalid and must be closed.
        at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
        at com.sleepycat.je.Environment.checkEnv(Environment.java:2225)
        at com.sleepycat.je.rep.ReplicatedEnvironment.getGroup(ReplicatedEnvironment.java:662)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.detectGroupChangesAndNotify(ReplicatedEnvironmentFacade.java:1541)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1491)
        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1475)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception:
(JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE:
IOException on write, log is likely incomplete. Environment is invalid and must be closed.
        at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
{noformat}

The operation of the other nodes in the group was unaffected.


> [Java Broker] Replica can go into a loop if it runs out of disk space
> ---------------------------------------------------------------------
>
>                 Key: QPID-6222
>                 URL: https://issues.apache.org/jira/browse/QPID-6222
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Broker
>    Affects Versions: 0.30
>            Reporter: Alex Rudyy
>            Assignee: Alex Rudyy
>             Fix For: 0.31
>
>         Attachments: 0001-QPID-6222-Halt-broker-on-running-out-of-disk-space.patch
>
>
> On running out of disk space the replica transitions into DETACH role, but then it went
into a loop.
> {noformat}
> 2014-11-07 05:43:55,534 ERROR [DETACHED mynode2a(4)] (replication.ReplicatedEnvironmentFacade)
- Asynchronous exception thrown by BDB thread 'DETACHED mynode2a(4)'
> com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by:
com.sleepycat.je.LogWriteException: Environment invalid because of previous exception: (JE
5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE: IOException
on write, log is likely incomplete. Environment is invalid and must be closed. Problem seen
replaying entry INS_LN_TX/8 vlsn=4,727,877 isReplicated="1"  txn=-1576661
>         at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
>         at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
>         at com.sleepycat.je.dbi.CursorImpl.checkEnv(CursorImpl.java:2853)
>         at com.sleepycat.je.Cursor.checkEnv(Cursor.java:4181)
>         at com.sleepycat.je.Cursor.close(Cursor.java:517)
>         at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:890)
>         at com.sleepycat.je.rep.impl.node.Replay.replayEntry(Replay.java:557)
>         at com.sleepycat.je.rep.impl.node.Replica$ReplayThread.run(Replica.java:945)
> Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous
exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device
LOG_WRITE: IOException on write, log is likely incomplete. Environment is invalid and must
be closed.
>         at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
>         at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
>         at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:370)
>         at com.sleepycat.je.log.LogBufferPool.bumpAndWriteSynced(LogBufferPool.java:339)
>         at com.sleepycat.je.log.LogBufferPool.getWriteBuffer(LogBufferPool.java:269)
>         at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:706)
>         at com.sleepycat.je.log.LogManager.serialLogInternal(LogManager.java:538)
>         at com.sleepycat.je.log.SyncedLogManager.serialLog(SyncedLogManager.java:46)
>         at com.sleepycat.je.log.LogManager.logItems(LogManager.java:485)
>         at com.sleepycat.je.log.LogManager.multiLog(LogManager.java:399)
>         at com.sleepycat.je.log.LogManager.log(LogManager.java:339)
>         at com.sleepycat.je.tree.LN.logInternal(LN.java:553)
>         at com.sleepycat.je.tree.LN.optionalLog(LN.java:372)
>         at com.sleepycat.je.dbi.CursorImpl.insertNewSlot(CursorImpl.java:989)
>         at com.sleepycat.je.dbi.CursorImpl.putInternal(CursorImpl.java:880)
>         at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:816)
>         at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:2397)
>         at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2352)
>         at com.sleepycat.je.Cursor.putNotify(Cursor.java:2260)
>         at com.sleepycat.je.Cursor.putForReplay(Cursor.java:2134)
>         at com.sleepycat.je.DbInternal.putForReplay(DbInternal.java:138)
>         at com.sleepycat.je.rep.impl.node.Replay.applyLN(Replay.java:874)
>         ... 2 more
> Caused by: java.io.IOException: No space left on device
>         at java.io.RandomAccessFile.writeBytes0(Native Method)
>         at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
>         at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
>         at com.sleepycat.je.log.FileManager.writeToFile(FileManager.java:1753)
>         at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1633)
>         ... 23 more
> 2014-11-07 05:43:55,543 INFO  [StateChange-mygroup:mynode2a] (highavailability.role_changed)
- [mng:admin(/169.82.112.40:1310)] [grp(/mygroup)] HA-1010 : Role change reported: Node :
'mynode2a' (169.113.198.175:5000) : from 'REPLICA' to 'DETACHED'
> {noformat}
> The group change learner kept repeatedly reporting an exception:
> {noformat}
> 2014-11-07 05:44:01,457 ERROR [Group-Change-Learner:mygroup:mynode2a] (util.ServerScopedRuntimeException)
- Unexpected exception occurred in replicated environment
> com.sleepycat.je.LogWriteException: (JE 5.0.97) Environment must be closed, caused by:
com.sleepycat.je.LogWriteException: Environment invalid because of previous exception: (JE
5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device LOG_WRITE: IOException
on write, log is likely incomplete. Environment is invalid and must be closed.
>         at com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)
>         at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)
>         at com.sleepycat.je.Environment.checkEnv(Environment.java:2225)
>         at com.sleepycat.je.rep.ReplicatedEnvironment.getGroup(ReplicatedEnvironment.java:662)
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.detectGroupChangesAndNotify(ReplicatedEnvironmentFacade.java:1541)
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1491)
>         at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade$RemoteNodeStateLearner.call(ReplicatedEnvironmentFacade.java:1475)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous
exception: (JE 5.0.97) mynode2a(4):/tmp/vhost java.io.IOException: No space left on device
LOG_WRITE: IOException on write, log is likely incomplete. Environment is invalid and must
be closed.
>         at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1648)
>         at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:411)
> {noformat}
> The operation of the other nodes in the group was unaffected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org


Mime
View raw message