directory-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r976876 - in /websites/staging/directory/trunk/content: ./ mavibot/ mavibot/user-guide/ mavibot/user-guide/images/
Date Sat, 02 Jan 2016 08:38:23 GMT
Author: buildbot
Date: Sat Jan  2 08:38:23 2016
New Revision: 976876

Staging update by buildbot for directory

    websites/staging/directory/trunk/content/mavibot/user-guide/images/R4-node.png   (with
    websites/staging/directory/trunk/content/mavibot/user-guide/images/r3tor4.gif   (with
    websites/staging/directory/trunk/content/   (props changed)

Propchange: websites/staging/directory/trunk/content/
--- cms:source-revision (original)
+++ cms:source-revision Sat Jan  2 08:38:23 2016
@@ -1 +1 @@

Modified: websites/staging/directory/trunk/content/mavibot/user-guide.html
--- websites/staging/directory/trunk/content/mavibot/user-guide.html (original)
+++ websites/staging/directory/trunk/content/mavibot/user-guide.html Sat Jan  2 08:38:23 2016
@@ -183,9 +183,7 @@ h2:hover > .headerlink, h3:hover > .head
 <li><a href="user-guide/2-btree-types.html">2 - B-tree Flavors</a><ul>
+<li><a href="user-guide/2.1-mvcc-btree.html">2.1 - MVCC B-tree</a></li>
 <li><a href="user-guide/3-btree-management.html">3 - Mavibot B-tree management</a><ul>

Modified: websites/staging/directory/trunk/content/mavibot/user-guide/1-introduction.html
--- websites/staging/directory/trunk/content/mavibot/user-guide/1-introduction.html (original)
+++ websites/staging/directory/trunk/content/mavibot/user-guide/1-introduction.html Sat Jan
 2 08:38:23 2016
@@ -185,6 +185,7 @@ h2:hover > .headerlink, h3:hover > .head
 <h1 id="1-introduction">1 - Introduction<a class="headerlink" href="#1-introduction"
title="Permanent link">&para;</a></h1>
 <p>The goal of this guide is to explain the usage and internals of Mavibot to developers.
 <p>We hope it will be enough for you to quickly get started, but in any case, if you
feel like improving this document, feel free to post your suggestion(s) to the Apache Directory
mailing list : any contribution is welcomed !</p>
+<p>The first two chapters is all about theory, you can skip thema nd jump to chapter
3 if you want to get started quickly.</p>
     <div class="nav">

Modified: websites/staging/directory/trunk/content/mavibot/user-guide/1.1-btree-basics.html
--- websites/staging/directory/trunk/content/mavibot/user-guide/1.1-btree-basics.html (original)
+++ websites/staging/directory/trunk/content/mavibot/user-guide/1.1-btree-basics.html Sat
Jan  2 08:38:23 2016
@@ -185,17 +185,20 @@ h2:hover > .headerlink, h3:hover > .head
 <h1 id="11-b-tree-basics">1.1 - B-tree Basics<a class="headerlink" href="#11-b-tree-basics"
title="Permanent link">&para;</a></h1>
 <p><strong>B-tree</strong> was invented back in 1971 by <strong>Bayer</strong>
and <strong>McCreight</strong>  (the <strong>B</strong> does not mean
anything known, speculations are that it comes either form the <strong>B</strong>
of <strong>Bayer</strong> or from <strong>Boing</strong> they were
working for back then). It was an extension to binary search tree, which was just able to
store 2 elements per page.</p>
 <p>A <strong>B-tree</strong> is a "tree data structure that keeps data
sorted and allows searches, sequential access, insertions, and deletions in logarithmic time."
(see <a href="">Wikipedia</a>)</p>
-<p>The important point here is the last one : it guarantees <strong>O(logn)</strong>
operations, compared to any other data structures (a hashed data structure offers <strong>O(n)</strong>
average operations, but can degenerate to <strong>O(n2)</strong>, and ordering
is not kept. But there is more : by gathering N elements in a page, which leads to  more comparisons
than necessary, but still save a lot of disk accesses, as those comparison will be done in
memory when a page is fully loaded. That's the reason for <strong>B-trees</strong>
to have been invented. </p>
+<p>The important point here is the last one : it guarantees <strong>O(logn)</strong>
operations, compared to any other data structures (a hashed data structure offers <strong>O(n)</strong>
average operations, but can degenerate to <strong>O(n2)</strong>, and ordering
is not kept). Although it would be optimal to keep only 2 elements in each node, it's way
better from a performance point of view to keep more than 2 elements in each node, even if
it leads to more comparison than strictly necessary. The reason is that reading a fixed size
from disk is better than reading only what we need (considering that a disk access is 3 to
4 orders of magnitude slower than RAM access, it's always a good idea to cache pages in memory,
rather than read data from the disk over and over. And these 3 to 4 orders of magnitude is
for SSD... For spinning disks, we are more in the 4 to 5 orders of magnitude !)</p>
+<p>Add to that the fact that when <em>B-tree</em> were invented, data were
stored on magnetic tapes, with a pure sequential access. Reading data sequentially made sense,
and reading a block of data was the way to go - thus the page was meant to store more than
2 elements -.</p>
 <p><strong>B-trees</strong> are everywhere : databases, OS, etc. It's a
critical data structure when you are to deal with a large set of data.</p>
 <p>1.1.1 - Inside a B-tree</p>
-<p>A <strong>B-Tree</strong> contains <strong>Nodes</strong>
and <strong>Leaves</strong>. A <em>Node</em> points to other <strong>Nodes</strong>
or <strong>Leaves</strong>. Both <strong>Nodes</strong> and <strong>Leaves</strong>
have <strong>Keys</strong> that are associated with <em>Values</em>.
<strong>Keys</strong> are not copied in many pages, they just appear once in the
whole <strong>B-tree</strong>.</p>
+<p>A <strong>B-Tree</strong> contains <strong>Nodes</strong>
and <strong>Leaves</strong>. A <em>Node</em> points to other <strong>Nodes</strong>
or <strong>Leaves</strong>. <strong>Nodes</strong> contain only <strong>keys</strong>
and references to child <strong>nNodes</strong>, when <strong>Leaves</strong>
have <strong>Keys</strong> that are associated with <strong>Values</strong>.
The <strong>Nodes</strong>' <strong>Keys* are used to find the right child
that will ultimately contain the </strong>Value** we are looking for (if it's present)</p>
 <p>Pretty simple !</p>
 <p>One last thing : <strong>Keys</strong> are ordered, and this is the
condition for the easy and fast retrieval of <strong>Values</strong>.</p>
 <p>A few more rules are enforced :
 <em> A <strong>Node</strong> and a <strong>Leaf</strong> contains
up to N values (N being generally a power of 2, so 2, 4, 8, 16...).
 </em> You can't have less than N/2 <strong>Values</strong> or <strong>keys</strong>
in a <strong>Leaf</strong> or a <strong>Node</strong>, except for
the root <strong>Node</strong>.</p>
+<p>The second rule allows the <strong>B-tree</strong> to have only one
<strong>Leaf</strong>, when up to <strong>N</strong> elements are
to be stored. If we want to store one mor element, then the root <strong>Leaf</strong>
will be split and the elements will be spread equally in two <strong>Leaves</strong>,
with a root <strong>Node$* containing references on those two </strong>Leaves**.</p>
 <p>1.1.2 - Concurrent access</p>
 <p>The real problem with <strong>B-tree</strong>s is that we need to use
some locks to allow concurrent access to the data, especially when some updates occur. The
rationale is that when browsing a <strong>B-tree</strong>, changing it will just
break the browse operation. There are some technics to avoid having such an issue, up to a
point no read locks are needed, assuming some extra information is added to the pages : a
pointer to the next page at the same level. Sadly, it comes with drawbacks, like it's difficult
to remove empty pages from the <strong>B-tree</strong>.</p>
+<p>What we want is a way to guarantee that many users can read data from the <strong>B-tree</strong>,
even while the <strong>B-tree</strong> is being updated. Of course, that will
heavily favor reads over writes, as writes will have to be serialized, but in many cases,
this is a comfortable trade we are ready to make.</p>
     <div class="nav">

Modified: websites/staging/directory/trunk/content/mavibot/user-guide/2-btree-types.html
--- websites/staging/directory/trunk/content/mavibot/user-guide/2-btree-types.html (original)
+++ websites/staging/directory/trunk/content/mavibot/user-guide/2-btree-types.html Sat Jan
 2 08:38:23 2016
@@ -190,14 +190,14 @@ h2:hover > .headerlink, h3:hover > .head
 <li>Counted B-tree</li>
 <li>MVCC B-tree</li>
-<h2 id="21-btree">2.1 - B+tree<a class="headerlink" href="#21-btree" title="Permanent
+<h2 id="btree">B+tree<a class="headerlink" href="#btree" title="Permanent link">&para;</a></h2>
 <p>This is a <strong>B-tree</strong> which does not store values in the
<strong>Nodes</strong>, and a link between <strong>Leaves</strong>
is added, to speed up the browsing : no need to go up to the parent's node to get the next
value when reaching the end of a leaf. Also the <strong>nodes</strong> don't contain
-<h2 id="22-btree">2.2 - B*tree<a class="headerlink" href="#22-btree" title="Permanent
+<h2 id="btree_1">B*tree<a class="headerlink" href="#btree_1" title="Permanent link">&para;</a></h2>
 <p>A slightly different form of <strong>B+tree</strong>, where the number
of elements per page is enforced to be at leat 2/3rd of the maximum numbers of elements the
page can hold. It speeds up the retrieval of elements a bit, as we have a denser tree.</p>
-<h2 id="23-counted-b-tree">2.3 - Counted B-tree<a class="headerlink" href="#23-counted-b-tree"
title="Permanent link">&para;</a></h2>
+<h2 id="counted-b-tree">Counted B-tree<a class="headerlink" href="#counted-b-tree"
title="Permanent link">&para;</a></h2>
 <p>Another slightly different implementation of a <strong>B+tree</strong>,
where the number of elements stored in the descendants is stored within each key. This allows
an easy count of elements on the left and right to any element, at the price of additional
piece of information being added on each page.</p>
-<h2 id="24-mvcc-b-tree">2.4 - MVCC B-tree<a class="headerlink" href="#24-mvcc-b-tree"
title="Permanent link">&para;</a></h2>
-<p>This is a new 'style' of <strong>B+tree</strong>, where the structure
is exactly the same than a simple <strong>B+tree</strong>, except that we keep
old versions alive, until nobody use them. The idea is that a new revision of the <strong>B+tree</strong>
is added only when the updates are fully done. This has the extremely intersting characteristic
that there is no need of locks for read and writes, assuming that writes are not concurrent
(they are serialized).</p>
+<h2 id="mvcc-b-tree">MVCC B-tree<a class="headerlink" href="#mvcc-b-tree" title="Permanent
+<p>This is a new 'style' of <strong>B+tree</strong>, where the structure
is exactly the same than a simple <strong>B+tree</strong>, except that we keep
old versions alive, until nobody use them. The idea is that a new revision of the <strong>B+tree</strong>
is added only when an update is fully done. This has the extremely intersting characteristic
that there is no need of locks for read and writes, assuming that writes are not concurrent
(they are serialized).</p>
 <p>In other words, you may have many reads at the same time, still being able to update
the data.</p>
 <p>It comes with a price though : a lot of pages are copied during every update, as
we create a new copy of every modified page, up to the root page. </p>
 <p>We also have to manage old pages that can be removed as they belong to unused versions,
which requires some extra work.</p>

Added: websites/staging/directory/trunk/content/mavibot/user-guide/2.1-mvcc-btree.html
--- websites/staging/directory/trunk/content/mavibot/user-guide/2.1-mvcc-btree.html (added)
+++ websites/staging/directory/trunk/content/mavibot/user-guide/2.1-mvcc-btree.html Sat Jan
 2 08:38:23 2016
@@ -0,0 +1,295 @@
+<!DOCTYPE html>
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+	<head>
+		<title>2 - MVCC B-tree &mdash; Apache Directory</title>
+	    <link href="./../../css/common.css" rel="stylesheet" type="text/css">
+	    <link href="./../../css/turquoise.css" rel="stylesheet" type="text/css">
+        <link rel="shortcut icon" href="./../../images/mavibot-icon_16x16.png">
+        <!-- Google Analytics -->
+        <script src="" type="text/javascript"></script>
+        <script type="text/javascript">
+            _uacct = "UA-1358462-1";
+            urchinTracker();
+        </script>
+	</head>
+	<body>
+	    <div id="container">
+            <div id="header">
+                <div id="subProjectsNavBar">
+                    <a href="./../../">
+                        Main
+                    </a>
+                    &nbsp;|&nbsp;
+                    <a href="./../../apacheds">
+                        ApacheDS
+                    </a>
+                    &nbsp;|&nbsp;
+                    <a href="./../../studio">
+                        Studio
+                    </a>
+                    &nbsp;|&nbsp;
+                    <a href="./../../api">
+                        LDAP API
+                    </a>
+                    &nbsp;|&nbsp;
+                    <a href="./../../mavibot">
+                        <STRONG>Mavibot</STRONG>
+                    </a>
+                    &nbsp;|&nbsp;
+                    <a href="./../../escimo">
+                        eSCIMo
+                    </a>
+                    &nbsp;|&nbsp;
+                    <a href="./../../fortress">
+                        Fortress
+                    </a>
+                    &nbsp;|&nbsp;
+                    <a href="./../../kerby">
+                        Kerby
+                    </a>
+                </div><!-- subProjectsNavBar -->
+            </div><!-- header -->
+            <div id="content">
+                <div id="leftColumn">
+<div id="navigation">
+    <!--ul>
+      <li>
+        <a href="" target="_blank">
+          <img src="./../../images/ApacheConBudapest.png" width="125" height="125" alt="I'm
Speaking at ApacheCon Europe 2014! Join me!" title="I'm Speaking at ApacheCon Europe 2014!
Join me!" border="0" style="margin-bottom:-3px;"/>
+        </a>
+      </li>
+    </ul-->
+    <h5>Mavibot 1.0</h5>
+    <ul>
+        <li><a href="./../../mavibot/">Home</a></li>
+        <li><a href="./../../mavibot/news.html">News</a></li>
+    </ul>
+    <h5>Downloads</h5>
+    <ul>
+	    <li><a href="./../../mavibot/downloads.html">Version 1.0.0-M8</a>&nbsp;&nbsp;<IMG
src="./../../images/new_badge.gif" alt="" style="margin-bottom:-3px;" border="0"></li>
+        <li><a href="./../../mavibot/download-old-versions.html">Older versions</a></li>
+    </ul>
+    <h5>Getting Started</h5>
+    <ul>
+        <li><a href="./../../mavibot/vision.html">Vision</a></li>
+    </ul>
+    <h5>Documentation</h5>
+    <ul>
+        <li><a href="./../../mavibot/five-minutes-tutorial.html">Five minutes
+	<li><a href="./../../mavibot/user-guide.html">User Guide</a></li>
+        <li><a href="./../../mavibot/gen-docs/latest/apidocs/">JavaDocs</a></li>
+        <li><a href="./../../mavibot/gen-docs/latest/xref/">Cross-Reference</a></li>
+        <!--li><a href="./../../mavibot/gen-docs/latest/">Generated Reports</a></li-->
+        <li><a href="./../../mavibot/developer-guide.html">Developer Guide</a></li>
+    </ul>
+    <h5>Support</h5>
+    <ul>
+        <li><a href="./../../mailing-lists-and-irc.html">Mailing Lists &amp;
+        <li><a href="./../../sources.html">Sources</a></li>
+        <li><a href="./../../issue-tracking.html">Issue Tracking</a></li>
+        <li><a href="./../../commercial-support.html">Commercial Support</a></li>
+    </ul>
+    <h5>Community</h5>
+    <ul>
+        <li><a href="./../../contribute.html">How to Contribute</a></li>
+        <li><a href="./../../team.html">Team</a></li>
+        <li><a href="./../../original-project-proposal.html">Original Project
+        <li><a href="./../../special-thanks.html" class="external-link" rel="nofollow">Special
+    </ul>
+    <h5>About Apache</h5>
+    <ul>
+        <li><a href="">Apache</a></li>
+        <li><a href="">License</a></li>
+        <li><a href="">Sponsorship</a></li>
+        <li><a href="">Thanks</a></li>
+        <li><a href="">Security</a></li>
+    </ul>
+</div><!-- navigation -->
+                </div><!-- leftColumn -->
+                <div id="rightColumn">
+    <div class="nav">
+        <div class="nav_prev">
+            <a href="2-btree-types.html">2 - B-tree flavors</a>
+        </div>
+        <div class="nav_up">
+            <a href="../user-guide.html">User Guide</a>
+        </div>
+        <div class="nav_next">
+            <a href="3-btree-management.html">3 - Mavibot B-tree management</a>
+        </div>
+        <div class="clearfix"></div>
+    </div>
+<style type="text/css">
+/* The following code is added by
+   It was originally lifted from */
+ * Hide class="elementid-permalink", except when an enclosing heading
+ * has the :hover property.
+ */
+.headerlink, .elementid-permalink {
+  visibility: hidden;
+h2:hover > .headerlink, h3:hover > .headerlink, h1:hover > .headerlink, h6:hover
> .headerlink, h4:hover > .headerlink, h5:hover > .headerlink, dt:hover > .elementid-permalink
{ visibility: visible }</style>
+<h1 id="21-mvcc-b-tree">2.1 - MVCC B-tree<a class="headerlink" href="#21-mvcc-b-tree"
title="Permanent link">&para;</a></h1>
+<p><strong>MVCC</strong> stands for <strong>M</strong>ulti
<strong>V</strong>ersion <strong>C</strong>oncurrency <strong>C</strong>control.
It all boils down to allow readers to <strong>ALWAYS</strong> have a consistent
version of the data they reached at the time they started to work with them. In other words,
when a reader access the database, it will work on a specific version of it, that will never
+<p>The drawback is that the data may be accurate at the time the reader started to
work with them, but this reader will never see the updated data if a writer has modified them.</p>
+<p>Most of the time, this is a price the users are ready to pay, for at least two reasons
+- udpates are way less frequent that reads, and the users 'accept' the fact that what they
get is not necessarily up to date
+- there are some mechanisms that workaround this problem, checking afterward that the data
represent the latest state of the database, otherwise the operation is retried.</p>
+<p>You can read more about <strong>MVCC</strong> <strong>B-tree</strong>
on <a href="">Wikipedia,
+<h2 id="how-does-it-work">How does it work ?<a class="headerlink" href="#how-does-it-work"
title="Permanent link">&para;</a></h2>
+<p>A <strong>MVCC</strong> <strong>B-Tree</strong> is just
a plain standard <strong>B+Tree</strong>, except that we keep a track of more
than one version of the <strong>B+Tree</strong>. Actually, we have multiple roots,
one per version. This can bee seen in this picture :</p>
+<p><img alt="MVCC sample" src="images/R4-node.png" /></p>
+<p>Here, we have 4 different versions, from r1 to r4. Each one of them might be used
at the same time by one or many readers. In any case, the writer update only the latest one,
creating a new revision doing so. In this exemple, the writer has just created r4 updating
r3, adding the element 'I'. As we can see, some pages are shared across the revisions : the
leaf contaning the element 'A' (version r3) is also referenced by the revision r4. This is
a key : that means a page (<strong>Node</strong> or <strong>Leaf</strong>
may be used by many revisions).</p>
+<p>We don't delete old revision until they are not anymore in use. And even so, we
don't delete all the associated pages blindly : they may be referenced by newer versions that
are still active.</p>
+<h3 id="browsing-the-database">Browsing the Database<a class="headerlink" href="#browsing-the-database"
title="Permanent link">&para;</a></h3>
+<p>So we want to fetch some information from a B-Tree. That's just a matter of creating
a <em>ReadTransaction</em>, which will grab the latest existing revision for thi
<strong>B-Tree</strong> and allow you to browse data from this <strong>B-Tree</strong>.
One you are done with the <strong>B-Tree</strong>, you just have to close the
<em>ReadTransaction</em> (this is critical, because keeping it opened will pin
the revision, forbidding the removal of old pages. Although if you forget to do it, the revision
will be released after a period of time). This is a description of the typical code :</p>
+<div class="codehilite"><pre><span class="n">open</span> <span
class="n">a</span> <span class="n">ReadTransaction</span>
+<span class="n">get</span> <span class="n">the</span> <span class="n">B</span><span
class="o">-</span><span class="n">Tree</span> <span class="n">you</span>
<span class="n">want</span> <span class="n">to</span> <span class="n">read</span>
+<span class="n">read</span> <span class="n">the</span> <span class="n">B</span><span
class="o">-</span><span class="n">tree</span>
+<span class="p">...</span>
+<span class="n">close</span> <span class="n">the</span> <span
+<p>It's important to note that the transaction covers all the access on all the managed
<strong>B-Tree</strong>. There is no need to open a new <em>ReadTransaction</em>
if you want to fetch some other information from another <strong>B-Tree</strong>.
That also mean you can be sure that you have a consistent view of all the database for a given
+<p>You may have many threads reading data from many <strong>B-Trees</strong>,
and we will have as many <em>ReadTransaction</em> created for that purpose. As
we may keep a <em>readTransaction</em> opened for quite a long period of time,
it's likely that at some point, the various Threads will use different versions of the <strong>B-Trees</strong>.</p>
+<h3 id="updating-the-database">Updating the Database<a class="headerlink" href="#updating-the-database"
title="Permanent link">&para;</a></h3>
+<p>This is a bit more complex. Here, you need a <em>WriteTransaction</em>,
and there can be only one running across all the application. That means the access to the
Database is serialized, when it comes to update it. The direct consequence is that writes
are <strong>slow</strong>. </p>
+<p>Again, the typical call is like :</p>
+<div class="codehilite"><pre><span class="n">open</span> <span
class="n">a</span> <span class="n">WriteTransaction</span>
+<span class="n">get</span> <span class="n">the</span> <span class="n">B</span><span
class="o">-</span><span class="n">Tree</span> <span class="n">you</span>
<span class="n">want</span> <span class="n">to</span> <span class="n">update</span>
+<span class="n">update</span> <span class="n">the</span> <span
class="n">B</span><span class="o">-</span><span class="n">tree</span>
+<span class="p">...</span>
+<span class="n">close</span> <span class="n">the</span> <span
+<p>and again, many <strong>B-Trees</strong> might be updated during this
<em>writeTransaction</em>. For a system like a <strong>LDAP</strong>
server, this is critical, because updating an entry is impacting many <strong>B-trees</strong>,
and we want all of them to be consistent globally : eitehr all the <strong>B-trees</strong>
are updated, or none of them.</p>
+<p>It's even more critical to close the <em>writeTransaction</em> because
all the other writers are waiting for this transaction to be completed to be able to proceed.
+<p>Let's now describe how an update works, because it's quite convoluted.</p>
+<h4 id="updates-from-the-tranchees">Updates, from the tranchees...<a class="headerlink"
href="#updates-from-the-tranchees" title="Permanent link">&para;</a></h4>
+<p>Updating a <strong>B-Tree</strong> is not a simple operation. Actually,
we want to guarantee that the operation succeed or fail, but in any case, leave the database
in a consistant state. We also want to guarantee that even if a crash occurs in the middle
of a write will leave the database in a consistant state. Let's see how it's possible.</p>
+<p>A <strong>B-Tree</strong> is composed of 3 elements :</p>
+<div class="codehilite"><pre><span class="o">*</span> <span class="n">The</span>
<span class="o">*</span><span class="n">BTreeHeader</span><span
class="o">*</span><span class="p">,</span> <span class="n">which</span>
<span class="n">contains</span> <span class="n">the</span> <span
class="n">reference</span> <span class="n">on</span> <span class="n">the</span>
<span class="n">two</span> <span class="n">other</span> <span class="n">elements</span>
+<span class="o">*</span> <span class="n">The</span> <span class="o">*</span><span
class="n">BTreeInfo</span><span class="o">*</span><span class="p">,</span>
<span class="n">which</span> <span class="n">contains</span> <span
class="n">informations</span> <span class="n">relative</span> <span
class="n">to</span> <span class="n">this</span> <span class="o">**</span><span
class="n">B</span><span class="o">-</span><span class="n">Tree</span><span
class="o">**</span> <span class="p">(</span><span class="n">this</span>
<span class="n">element</span> <span class="n">is</span> <span
class="n">never</span> <span class="n">updated</span><span class="p">)</span>
+<span class="o">*</span> <span class="n">The</span> <span class="o">*</span><span
class="n">RootPage</span><span class="o">*</span><span class="p">,</span>
<span class="n">which</span> <span class="n">is</span> <span class="n">the</span>
<span class="n">root</span> <span class="n">of</span> <span class="n">the</span>
<span class="o">**</span><span class="n">B</span><span class="o">-</span><span
class="n">Tree</span><span class="o">**</span>
+<p>Each of these elemenst are stored into <em>Pages</em>, and referenced
through the offset of those <em>Pages</em> on disk (the offset is a long, and
it's just the position of a page in the database file).</p>
+<p>The very first step is to retrieve the latest <strong>B-Tree</strong>
revision. We have a management <strong>B-Tree</strong> that keeps a track of all
the active revisions for every <strong>B-Tree</strong> : the <strong>BtreeOfBtrees</strong>,
called <strong>BOB</strong>. It stores a reference on each <strong>B-tree</strong>'s
header for every active revision. The key is a composition of the <strong>B-Tree</strong>
name and its revision : <em><name, revision> and it reference the </em>BTreeHeader<em>'s
offset (the position of the </em>BtreeHeader<em>'s page). Fo that, we browse the
</em>BOB* looking for the newest <Name, Revision> couple (actually, we are looking
for the first tuple that is lower that <name, infinite>).</p>
+<p>Once we have found the tuple, we know where to fetch the <em>BtreeHeader</em>
from. So do we. We can then fetch the <em>RootPage</em>, which is the starting
point for any update. </p>
+<p>At this point, the update is either an addition or a removal. That will potentially
change the whole structure of the <strong>B-Tree</strong>, or a single page. In
any case, we will have a new <em>RootPage</em> pointing to old pages and to newly
created pages. Let's see again the previous picture :</p>
+<p><img alt="MVCC sample" src="images/R4-node.png" /></p>
+<p>Let's say we are at revision 3, and we want to add an entry 'I' : as we can see,
the <em>rootPage</em> is updated, the right <strong>Leaf</strong>
is split in two, and the left <strong>Leaf</strong> is unchanged. All in all,
we have created 3 new pages, and 2 old pages are now unused and can be reclaimed later. The
internal modification of pages is not in the scope of this explaination, so we will leave
it for the moment.</p>
+<p>Here is an animated image that shows how it's done :</p>
+<p><img alt="R3 to R4" src="images/r3tor4.gif" /></p>
+<p>So now, we have a new <em>BtreeHeader</em>, revision 4. We have to keep
a track of it, which means we have to inject it into the <strong>BOB</strong>.
As it's a <strong>B-Tree</strong>, we proceed with it the exact same way, except
that we always have a reference on the <strong>BOB</strong> <em>BtreeHeader</em>
in the transaction context.</p>
+<DIV class="note" markdown="1">
+We will see later on what is the Transaction Context
+<p>We are almost done. We also have to keep a track of the 'old' pages that were copied.
In order to be able to reclaim them, we have to store them somewhere convenient. An 'old'
page is not necessarily 'old' for everyone : another thread might have a reader on revision
3, thus might need those 'old' pages. Actually, reclaiming 'old' pages is a bit complex and
will be explained later on. Enough said that we want to keep a track of every of those pages
in a <strong>B-Tree</strong>, to be able to reclaimn them later.</p>
+<p>We use a specific <strong>B-Tree</strong> for taht : the <strong>CopiedPageBtree</strong>,
or <strong>CPB</strong>. The key is different than for the <strong>BOB</strong>,
it's a composition of the revision we are creating, and the name of the <strong>B-Tree</strong>
being updated : <revision, name>. The value is the offset of the copied page. This way,
when the revision is not anymore in use by any reader, we can reclaim the associated pages.
+<p>The <strong>CPB</strong> is a <strong>B-Tree</strong>, so
the update process is the one we already have described. There is one big difference though
: once the <strong>CPB</strong> has been updated, we can immediately reclaim the
copied pages, we won't use them at all (this is because there is only one writer thread, and
this is the only one allowed to manipulate this <strong>B-Tree</strong>).</p>
+<p>We are almost done now. At this point, the targetted <strong>B-Tree</strong>
has been updated, and the <strong>BOB</strong> and <strong>CPB</strong>
have also been updated. But no reader can see those updated versions yet. We need to update
the <strong>RecordManagerHeader</strong>, a structure that contain the reference
to the current <strong>BOB</strong>. Once we have updated this structure, and
wrote it on disk, we can safely make the new <strong>BOB</strong> available for
any new readers.</p>
+<h4 id="reclaiming-unused-pages">Reclaiming unused pages<a class="headerlink" href="#reclaiming-unused-pages"
title="Permanent link">&para;</a></h4>
+<p>As we have seen in the previous paragraph, we always create new pages, we never
remove any. This is obviously going to eat a lot of disk space if we don't reclaim unused
pages !</p>
+<p>Hopefully, there is a <em>reclaimer</em> thread that is executed when
no writerer is running, which role is to check if some old pages can be removed. There are
strict conditions for a page to be candidate for removal :</p>
+<div class="codehilite"><pre><span class="o">*</span> <span class="n">It</span>
<span class="n">must</span> <span class="n">be</span> <span class="n">presnet</span>
<span class="n">in</span> <span class="n">the</span> <span class="o">**</span><span
class="n">CPB</span><span class="o">**</span>
+<span class="o">*</span> <span class="n">The</span> <span class="n">revision</span>
<span class="n">that</span> <span class="n">has</span> <span class="n">copied</span>
<span class="n">those</span> <span class="n">pages</span> <span
class="n">must</span> <span class="n">not</span> <span class="n">be</span>
<span class="n">in</span> <span class="n">use</span> <span class="n">by</span>
<span class="n">any</span> <span class="n">reader</span>
+<span class="o">*</span> <span class="n">The</span> <span class="n">revision</span>
<span class="n">must</span> <span class="n">be</span> <span class="n">the</span>
<span class="n">oldest</span> <span class="n">one</span>
+<p>The latest condition is the critical one : it means that if a revision is used for
a long period of time, we will not be able to reclaim any pages from this revision nor from
any younger revision, even if they are not in use. That may bloat the Database...</p>
+<p>There are ways to deal with this problem. Typically, if a page is copied twice,
then we can reclaim it if it's not referenced. The algorithm is a bit complex though...</p>
+<p>The easiest solution is simply to forbid long read transactions.</p>
+<h3 id="dealing-with-alive-revisions">Dealing with alive revisions<a class="headerlink"
href="#dealing-with-alive-revisions" title="Permanent link">&para;</a></h3>
+<p>Every time we have a reader starting a <em>readTransaction</em>, we
have to keep a track of the current revision, and the number of readers using it. This is
done using a specific List of Used Revisions. Every new reader will add a new revision to
this list, or increment a counter associated with the existing revision. Everytime a reader
close the <em>readTransaction</em>, the counter is decremented. When this counter
go down to 0, the revision can be discarded, unless it's the latest one.</p>
+<p>Discarding an old revision is possible only when it's the oldest. That means the
list may contain many inactive revisions, with a counter equal to 0. Those inactive revisions
will be removed when they will become the oldest revision.</p>
+    <div class="nav">
+        <div class="nav_prev">
+            <a href="2-btree-types.html">2 - B-tree flavors</a>
+        </div>
+        <div class="nav_up">
+            <a href="../user-guide.html">User Guide</a>
+        </div>
+        <div class="nav_next">
+            <a href="3-btree-management.html">3 - Mavibot B-tree management</a>
+        </div>
+        <div class="clearfix"></div>
+    </div>
+                </div><!-- rightColumn -->
+                <div id="endContent"></div>
+            </div><!-- content -->
+            <div id="footer">&copy; 2003-2015, <a href="">The
Apache Software Foundation</a> - <a href="./../../privacy-policy.html">Privacy
Policy</a><br />
+                Apache Directory, ApacheDS, Apache Directory Server, Apache Directory Studio,
Apache LDAP API, Apache Triplesec, Triplesec, Apache Mavibot, Mavibot, Apache eSCIMo, eSCIMo,
Fortress, Apache Fortress, EnMasse, Apache EnMasse, Apache Kerby, Kerby
+                Apache, the Apache feather logo, and the Apache Directory project logos are
trademarks of The Apache Software Foundation.
+            </div>
+        </div><!-- container -->
+    </body>

Added: websites/staging/directory/trunk/content/mavibot/user-guide/images/R4-node.png
Binary file - no diff available.

Propchange: websites/staging/directory/trunk/content/mavibot/user-guide/images/R4-node.png
    svn:mime-type = image/png

Added: websites/staging/directory/trunk/content/mavibot/user-guide/images/r3tor4.gif
Binary file - no diff available.

Propchange: websites/staging/directory/trunk/content/mavibot/user-guide/images/r3tor4.gif
    svn:mime-type = image/gif

View raw message