ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Goncharuk <alexey.goncha...@gmail.com>
Subject Re: Rework locking architecture for MVCC and transactional SQL
Date Fri, 15 Dec 2017 13:26:18 GMT
Vladimir,

What about moving the entire locking mechanism to a separate off-heap
memory region which will be volatile wrt restarts, but will still support
off-load to disk. In the current architecture, it means that we will need
to allocate a separate DataRegion with no WAL and no crash recovery - locks
are meaningless after a restart, and we will automatically drop them. I
would be interesting to prototype this because I think we may be on-par
with on-heap lock placement, as we already proved for in-memory caches.

2017-12-14 21:53 GMT+03:00 Denis Magda <dmagda@apache.org>:

> Vladimir,
>
> No it’s crystal clear, thanks.
>
> If this approach works only for Ignite persistence based deployment, how
> will we handle locking for pure in-memory and caching of 3rd party
> databases scenarios? As I understand the tuples still will be stored in the
> page memory while there won’t be any opportunity to fallback to disk if the
> memory usage increases some threshold.
>
> —
> Denis
>
> > On Dec 13, 2017, at 11:21 PM, Vladimir Ozerov <vozerov@gridgain.com>
> wrote:
> >
> > Denis,
> >
> > Sorry, may be I was not clear enough - "tuple-approach" and "persistent
> > approach" are the same. By "tuple" I mean a row stored inside a data
> block.
> > Currently we store lock information in Java heap and proposal is to move
> it
> > to data blocks. The main driver is memory - if there are a rows to be
> > locked we will either run out of memory, or produce serious memory
> > pressure. For example, currently update of 1M entries will consume ~500Mb
> > of heap. With proposed approach it will consume almost nothing. The
> > drawback is increased number of dirty data pages, but it should not be a
> > problem because in final implementation we will update data rows before
> > prepare phase anyway, so I do not expect any write amplification in usual
> > case.
> >
> > This approach is only applicable for Ignite persistence.
> >
> > On Thu, Dec 14, 2017 at 1:53 AM, Denis Magda <dmagda@apache.org> wrote:
> >
> >> Vladimir,
> >>
> >> Thanks for a throughout overview and proposal.
> >>
> >>> Also we could try employing tiered approach
> >>> 1) Try to keep everything in-memory to minimize writes to blocks
> >>> 2) Fallback to persistent lock data if certain threshold is reached.
> >>
> >> What are the benefits of the backed-by-persistence approach in compare
> to
> >> the one based on tuples? Specifically:
> >> - will the persistence approach work for both 3rd party and Ignite
> >> persistence?
> >> - any performance impacts depending on a chosen method?
> >> - what’s faster to implement?
> >>
> >> —
> >> Denis
> >>
> >>> On Dec 13, 2017, at 2:10 AM, Vladimir Ozerov <vozerov@gridgain.com>
> >> wrote:
> >>>
> >>> Igniters,
> >>>
> >>> As you probably we know we work actively on MVCC [1] and transactional
> >> SQL
> >>> [2] features which could be treated as a single huge improvement. We
> >> face a
> >>> number of challenges and one of them is locking.
> >>>
> >>> At the moment information about all locks is kept in memory on
> per-entry
> >>> basis (see GridCacheMvccManager). For every locked key we maintain
> >> current
> >>> lock owner (XID) and the list of would-be-owner transactions. When
> >>> transaction is about to lock an entry two scenarios are possible:
> >>> 1) If entry is not locked we obtain the lock immediately
> >>> 2) if entry is locked we add current transaction to the wait list and
> >> jumps
> >>> to the next entry to be locked. Once the first entry is released by
> >>> conflicting transaction, current transaction becomes an owner of the
> >> first
> >>> entry and tries to promote itself for subsequent entries.
> >>>
> >>> Once all required locks are obtained, response is sent to the caller.
> >>>
> >>> This approach doesn't work well for transactional SQL - if we update
> >>> millions of rows in a single transaction we will simply run out of
> >> memory.
> >>> To mitigate the problem other database vendors keep information about
> >> locks
> >>> inside the tuples. I propose to apply the similar design as follows:
> >>>
> >>> 1) No per-entry lock information is stored in memory anymore.
> >>> 2) The list of active transactions are maintained in memory still
> >>> 3) When TX locks an entry, it sets special marker to the tuple [3]
> >>> 4) When TX meets already locked entry, it enlists itself to wait queue
> of
> >>> conflicting transaction and suspends
> >>> 5) When first transaction releases conflicting lock, it notifies and
> >> wakes
> >>> up suspended transactions, so they resume locking
> >>> 6) Entry lock data is cleared on transaction commit
> >>> 7) Entry lock data is not cleared on rollback or node restart; Instead,
> >> we
> >>> will could use active transactions list to identify invalid locks and
> >>> overwrite them as needed.
> >>>
> >>> Also we could try employing tiered approach
> >>> 1) Try to keep everything in-memory to minimize writes to blocks
> >>> 2) Fallback to persistent lock data if certain threshold is reached.
> >>>
> >>> Thoughts?
> >>>
> >>> [1] https://issues.apache.org/jira/browse/IGNITE-3478
> >>> [2] https://issues.apache.org/jira/browse/IGNITE-4191
> >>> [3] Depends on final MVCC design - it could be per-tuple XID, undo
> >> vectors,
> >>> per-block transaction lists, etc..
> >>>
> >>> Vladimir.
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message