ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anton Vinogradov ...@apache.org>
Subject Re: Partition reserve/release asymmetry
Date Fri, 10 Jan 2020 06:48:04 GMT
>> Does the issue reproduce in
>> subsequent runs?
Unfortunately no.
We performed 30+ runs without "success".

>> I think we can add an assertion to
>> GridDhtLocalPartition#destroy() method to check that reservations is 0
Ok, I will check and merge in case of success.
Created the Issue to handle this [1].

[1] https://issues.apache.org/jira/browse/IGNITE-12524

On Thu, Jan 9, 2020 at 1:46 PM Alexey Goncharuk <alexey.goncharuk@gmail.com>
wrote:

> Hello Anton,
>
> Thanks for digging into this. The logic with checking the
> reservations count seems fishy to me as well, so I have no objections with
> the suggested change. This "if" statement does not answer why the partition
> was being destroyed during the commit, though. Does the issue reproduce in
> subsequent runs?
>
> The logic around reserve/release seems ok to me, however, the
> eviction/renting code looks overly complicated, perhaps, there is a bug
> somewhere there? I think we can add an assertion to
> GridDhtLocalPartition#destroy() method to check that reservations is 0 when
> this method is called (there is a check for EVICTED state already there)
>
> --AG
>
> чт, 9 янв. 2020 г. в 09:45, Anton Vinogradov <av@apache.org>:
>
> > Folks,
> > Yardstick run (opt-serial-put-get-1-backup) failed with interesting
> > exception:
> > Critical system error detected. Will be handled accordingly to configured
> > handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> > super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> > [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> > failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> > o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing a
> > transaction has produced runtime exception]]
> > class
> >
> org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException:
> > Committing a transaction has produced runtime exception
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:838)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitRemoteTx(GridDistributedTxRemoteAdapter.java:893)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:1452)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processDhtTxFinishRequest(IgniteTxHandler.java:1375)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$600(IgniteTxHandler.java:123)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:241)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$7.apply(IgniteTxHandler.java:239)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
> > at
> >
> >
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1843)
> > at
> >
> >
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1468)
> > at
> >
> >
> org.apache.ignite.internal.managers.communication.GridIoManager.access$5200(GridIoManager.java:229)
> > at
> >
> >
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1365)
> > at
> >
> >
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:555)
> > at
> >
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: java.lang.IllegalStateException: Tree is being concurrently
> > destroyed: tx-p-470##CacheData
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.checkDestroyed(BPlusTree.java:1011)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1831)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1696)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1679)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:441)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4288)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:4262)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerSet(GridCacheMapEntry.java:1540)
> > at
> >
> >
> org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxRemoteAdapter.commitIfLocked(GridDistributedTxRemoteAdapter.java:675)
> > ... 19 more
> >
> > It seems, BPlusTree was destroyed between
> > GridDistributedTxRemoteAdapter.java:545 and
> > GridDistributedTxRemoteAdapter.java:675 while partition was reserved.
> >
> > See the full log [1] for details.
> >
> > During investigation weird code was found:
> > private void release0(int sizeChange) {
> >         while (true) {
> >             long state = this.state.get();
> >
> >             int reservations = getReservations(state);
> >
> >             if (reservations == 0) // How can it be zero at release
> > attempt?
> >                 return;
> >
> > I've replaced this weird code with assertion [2] and checked at TeamCity
> > twice, nothing failed.
> >
> > So, questions
> > 1) Any Idea why we able to have zero reservations at release attempt?
> > 2) Any objection to merging assertion instead of weird return to the
> master
> > branch?
> > 3) Any Idea why the exception happens?
> >
> > [1]
> >
> >
> https://gist.githubusercontent.com/anton-vinogradov/834fc63114a3e8d46b89ea4ccec8148b/raw/6438930c7fef119d0ad60df76d821fe7bd100c5e/gistfile1.txt
> > [2]
> >
> >
> https://gitbox.apache.org/repos/asf?p=ignite.git;a=commitdiff;h=b2c083564fb3b48ebe87042e0ed442dc0af3a74d
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message