ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksey Kuznetsov <alkuznetsov...@gmail.com>
Subject Re: [IGNITE-5714] Design proposal of suspend/resume for pessimistic tx
Date Wed, 28 Feb 2018 10:29:03 GMT
Thanx for the answer!

> Can we remove thread id completely from code?
> Can you estimate how much effort do we need?

I haven't removed thread id completely from the code because we tightly
coupled to it in some parts.
For instance, look at the following scenario :

"When we suspend transaction, we must make sure only thread started
transaction can suspend it. So we compare thread id in
`GridNearTxLocal#suspend()`."

Note that cache locks also make use of thread id, consider the following
scenario:

"We create cache lock and perform put operation in the same thread, which
create implicit transaction. Transaction would compare both therad ids in
tx and cache lock and can apply the optimisation: tx would take advantage
of cache lock, and would not create its own lock on key."

If we really needed to remove it completely from the code, significant
changes are required both to transactions and cache locks.

It could take several weeks to completely remove thread id.

> As far as I can see from parent task [1] we need some complex tests to be
implemented.
> Are they presented in prototype?

Tests were added for suspending/resuming optimistic transaction earlier for
parent task.
I adopted them for pessimistic mode.
Also tx rollback tests were aded.  They check whether timeout metrics are
correctly calculated for suspended transactions. You can find them in
`GridCacheTransactionalAbstractMetricsSelfTest`.

Please, feel free to propose additional tests for the ticket.


вт, 27 февр. 2018 г. в 17:27, Nikolay Izhikov <nizhikov@apache.org>:

> Hello, Alexey.
>
> Great mail, by the way.
> I think, it would be great to have this feature in Ignite.
>
> > I haven't removed thread id completely from code.
>
> Can we remove thread id completely from code?
> Can you estimate how much effort do we need?
>
> As far as I can see from parent task [1] we need some complex tests to be
> implemented.
> Are they presented in prototype?
>
> [1]
> https://issues.apache.org/jira/browse/IGNITE-4887?focusedCommentId=16069655&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16069655
>
> В Пн, 26/02/2018 в 10:59 +0000, Aleksey Kuznetsov пишет:
> > Hello Igniters!
> >
> > Currently we have suspension/resuming implemented for optimistic
> > transactions [1].
> > Unless suspend/resume isn't supported for pessimistic tx JTA isn't fully
> > supported [4].
> >
> > I’m working on a ticket "Suspension/resuming for pessimistic
> transactions"
> > [2].
> > Goal of the ticket is to support transaction suspend/resume for
> pessimistic
> > transactions.
> >
> > # Benefits of suspending/resuming transaction.
> >
> >   1. Full JTA standart support.
> >   2. Increase of throughput in high-load scenarios.
> >       Suspend operation would allow to release Ignite threads and
> > optionally perform some other logic.
> >
> > Note, current API has got suspend/resume only for optimistic
> transactions,
> > which confuses users.
> >
> > # Real life example.
> >
> > Consider the following scenario:
> >
> >   1. Application starts Ignite transaction.
> >   2. Business logic is executed inside transaction.
> >   3. For commit/rollback application need approval message from external
> > agent.
> >   4. Currently, thread inside Ignite is idle until approval is received.
> >   4a. When suspend/resume support is implemented, application can perform
> > suspend and release thread inside Ignite.
> >
> > # How pessimistic transaction works.
> >
> > When we perform put/get operations in pessimistic transactions, lock
> > request is sent to remote nodes by `GridNearLockRequest`.
> > Request contains thread id `IgniteTxAdapter#threadId`, in which operation
> > was performed.
> > In pessimistic mode, multiple transaction objects are created - on
> > primary, on backup nodes, and on originating node:
> > `GridNearTxLocal`, `GridDhtTxLocal`, `GridNearTxRemote`,
> `GridDhtTxRemote`.
> >
> > Thread id is used in logic on these nodes.
> > For instance, to check whether thread has successfully locked the key,
> > after lock acquisition attempt.
> > Or to check whether active transaction exists.
> >
> > # Main challenge for implementation.
> >
> > I've analysed implementation approaches and see the core issue:
> >
> > The essential problem with suspending/resuming lies in thread id field
> > transferred to remote nodes during put/get operations.
> >
> > Imagine, we want to suspend transaction and resume it in another thread.
> > See code snippet below:
> >
> > ```
> > tx = ignite.transactions().txStart(PESSIMISTIC, SERIALIZABLE);
> >
> > cache.put(1, 1);
> >
> > tx.suspend();
> > ....
> >
> > // In another thread.
> > tx.resume(); // Thread id will be changed in transaction instance.
> > ```
> >
> > Original thread id is transferred and saved on remote node.
> > After resuming thread id on local node differs from remote node.
> > I want to avoid one more network round trip to change thread Id on remote
> > node after transaction resuming.
> >
> > # Design proposal.
> >
> > Transaction id (`xid`) can be used instead of thread id on remote nodes.
> > The following solution is possible for the ticket :
> >
> > Replace thread id by transaction id for sending to remote nodes.
> > Thread id will be removed from the following classes:
> > `IgniteTxAdapter`, `GridDistributedTxPrepareRequest`,
> > `GridDistributedTxFinishRequest`, `GridDistributedTxFinishResponse`,
> > `GridDistributedTxPrepareRequest`.
> >
> > I haven't removed thread id completely from code. Thread id is moved to
> > `GridNearTxLocal`.
> > We still need it in near local transaction for many reasons, for example
> to
> > assure only thread started transaction can suspend it in
> > `GridNearTxLocal#suspend()`.
> > In future we can remove thread id completely. I propose to study this
> > question in another ticket.
> >
> > Also, thread id is remained in `GridDistributedLockRequest`.
> > Lock request used by cache locks and it need to transfer thread id to
> > remote nodes.
> > For example to use cache locks along with cache operations put/get, see
> > `GridNearTxLocal#updateExplicitVersion`.
> > As for pessimistic transaction, thread id in `GridDistributedLockRequest`
> > is set to `UNDEFINED_THREAD_ID`, which means we must not use it remotely.
> >
> > Note, that if user suspends transaction and forgets to resume it,
> > transaction would be rolled back once timeout has occurred.
> >
> > In my design when transaction is suspended, all locked keys remain
> locked.
> >
> > Please see my prototype of proposal implementation [3].
> >
> > Proposed changes are relatively small.
> > They ensure consistency of information about locks, if thread Id will be
> > changed within one transaction (by suspend/resume).
> > There will be used correct id for locks on remote nodes. It also requires
> > painstaking work, but changes will not affect the logic of oher
> components.
> >
> > Tell me please what do you think? Any suggestions and comments will be
> > helpful.
> >
> > If you agree with my design I also will do benchmarking.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-5712
> > [2] https://issues.apache.org/jira/browse/IGNITE-5714
> > [3] https://github.com/apache/ignite/pull/2789
> > [4] Section 3.2.3
> >
> http://download.oracle.com/otn-pub/jcp/jta-1.1-spec-oth-JSpec/jta-1_1-spec.pdf

-- 

*Best Regards,*

*Kuznetsov Aleksey*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message