jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mreut...@apache.org
Subject svn commit: r1835390 [11/23] - in /jackrabbit/site/live/oak/docs: ./ architecture/ coldstandby/ features/ nodestore/ nodestore/document/ nodestore/segment/ oak-mongo-js/ oak_api/ plugins/ query/ security/ security/accesscontrol/ security/authentication...
Date Mon, 09 Jul 2018 08:53:19 GMT
Modified: jackrabbit/site/live/oak/docs/query/indexing.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/indexing.html?rev=1835390&r1=1835389&r2=1835390&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/indexing.html (original)
+++ jackrabbit/site/live/oak/docs/query/indexing.html Mon Jul  9 08:53:17 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-05-24 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-07-09 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180524" />
+    <meta name="Date-Revision-yyyymmdd" content="20180709" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Indexing</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -136,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-05-24<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-07-09<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -241,120 +241,95 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><h1>Indexing</h1>
-
+  -->
+<h1>Indexing</h1>
 <ul>
-  
+
 <li><a href="#indexing">Indexing</a>
-  
 <ul>
-    
+
 <li><a href="#overview">Overview</a>
-    
 <ul>
-      
+
 <li><a href="#new-1.6">New in 1.6</a></li>
-    </ul></li>
-    
+</ul>
+</li>
 <li><a href="#indexing-flow">Indexing Flow</a>
-    
 <ul>
-      
+
 <li><a href="#index-defnitions">Index Definitions</a>
-      
 <ul>
-        
+
 <li><a href="#oak-index-nodes">Index Definition Location</a></li>
-      </ul></li>
-      
+</ul>
+</li>
 <li><a href="#sync-indexing">Synchronous Indexing</a></li>
-      
 <li><a href="#async-indexing">Asynchronous Indexing</a>
-      
 <ul>
-        
+
 <li><a href="#checkpoint">Checkpoint</a></li>
-        
 <li><a href="#indexing-lane">Indexing Lane</a></li>
-        
 <li><a href="#cluster">Clustered Setup</a>
-        
 <ul>
-          
+
 <li><a href="#async-index-lease">Indexing Lease</a></li>
-        </ul></li>
-        
+</ul>
+</li>
 <li><a href="#async-index-lag">Indexing Lag</a></li>
-        
 <li><a href="#async-index-setup">Setup</a></li>
-        
 <li><a href="#async-index-mbean">Async Indexing MBean</a></li>
-        
 <li><a href="#corrupt-index-handling">Isolating Corrupt Indexes</a></li>
-      </ul></li>
-      
+</ul>
+</li>
 <li><a href="#nrt-indexing">Near Real Time Indexing</a>
-      
 <ul>
-        
+
 <li><a href="#nrt-indexing-usage">Usage</a>
-        
 <ul>
-          
+
 <li><a href="#nrt-indexing-mode-nrt">NRT Indexing Mode - nrt</a></li>
-          
 <li><a href="#nrt-indexing-mode-sync">NRT Indexing Mode - sync</a></li>
-        </ul></li>
-        
+</ul>
+</li>
 <li><a href="#nrt-indexing-cluster-setup">Cluster Setup</a></li>
-        
 <li><a href="#nrt-indexing-config">Configuration</a></li>
-      </ul></li>
-    </ul></li>
-    
+</ul>
+</li>
+</ul>
+</li>
 <li><a href="#reindexing">Reindexing</a>
-    
 <ul>
-      
+
 <li><a href="#reduce-reindexing-times">Reducing reindexing times</a></li>
-      
 <li><a href="#abort-reindex">How to Abort Reindexing</a></li>
-    </ul></li>
-  </ul></li>
+</ul>
+</li>
+</ul>
+</li>
 </ul>
 <div class="section">
 <h2><a name="Overview"></a><a name="overview"></a> Overview</h2>
 <p>For queries to perform well, Oak supports indexing of content that is stored in the repository. Indexing works by comparing different versions of the node data (technically, &#x201c;diff&#x201d; between the base <tt>NodeState</tt> and the modified <tt>NodeState</tt>). The indexing mode defines how comparing is performed, and when the index content gets updated:</p>
-
 <ol style="list-style-type: decimal">
-  
+
 <li>Synchronous Indexing</li>
-  
 <li>Asynchronous Indexing</li>
-  
 <li>Near Real Time (NRT) Indexing</li>
 </ol>
 <p>Indexing uses <a href="../architecture/nodestate.html#commit-editors">Commit Editors</a>. Some of the editors are of type <tt>IndexEditor</tt>, which are responsible for updating index content based on changes in main content. Currently, Oak has following in built editors:</p>
-
 <ol style="list-style-type: decimal">
-  
+
 <li>PropertyIndexEditor</li>
-  
 <li>ReferenceEditor</li>
-  
 <li>LuceneIndexEditor</li>
-  
 <li>SolrIndexEditor</li>
 </ol>
 <div class="section">
 <h3><a name="New_in_1.6"></a><a name="new-1.6"></a> New in 1.6</h3>
-
 <ul>
-  
+
 <li><a href="#nrt-indexing">Near Real Time (NRT) Indexing</a></li>
-  
 <li><a href="#async-index-setup">Multiple Async indexers setup via OSGi config</a></li>
-  
 <li><a href="#corrupt-index-handling">Isolating Corrupt Indexes</a></li>
 </ul></div></div>
 <div class="section">
@@ -364,45 +339,37 @@
 <h3><a name="Index_Definitions"></a><a name="index-defnitions"></a> Index Definitions</h3>
 <p>Index definitions are nodes of type <tt>oak:QueryIndexDefinition</tt>, which are stored under a special node named <tt>oak:index</tt>. As part of diff traversal, at each level, <tt>IndexUpdate</tt> looks for <tt>oak:index</tt> nodes. Below is the canonical index definition structure:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/indexName
+<div>
+<div>
+<pre class="source">/oak:index/indexName
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - type (string) mandatory
   - async (string) multiple
   - reindex (boolean)
 </pre></div></div>
-<p>The index definitions nodes have the following properties:</p>
 
+<p>The index definitions nodes have the following properties:</p>
 <ol style="list-style-type: decimal">
-  
+
 <li><tt>type</tt> - It determines the <i>type</i> of index. <tt>IndexUpdate</tt> looks for an <tt>IndexEditor</tt> of the given type from the registered <tt>IndexEditorProvider</tt>. For an out-of-the-box Oak setup, it can have one of the following values:
-  
 <ul>
-    
-<li><tt>reference</tt> - Configured with the out-of-the-box setup</li>
-    
+
+<li><tt>reference</tt> -  Configured with the out-of-the-box setup</li>
 <li><tt>counter</tt> - Configured with the out-of-the-box setup</li>
-    
 <li><tt>property</tt></li>
-    
 <li><tt>lucene</tt></li>
-    
 <li><tt>solr</tt></li>
-  </ul></li>
-  
+</ul>
+</li>
 <li><tt>async</tt> - This determines if the index is to be updated synchronously or asynchronously. It can have following values:
-  
 <ul>
-    
+
 <li><tt>sync</tt> - The default value. It indicates that index is meant to be updated as part of each commit.</li>
-    
-<li><tt>nrt</tt> - Indicates that index is a <a href="#nrt-indexing">near real time</a> index.</li>
-    
-<li><tt>async</tt> - Indicates that index is to be updated asynchronously.  In such a case, this value is used to determine  the <a href="#indexing-lane">indexing lane</a></li>
-    
+<li><tt>nrt</tt>  - Indicates that index is a <a href="#nrt-indexing">near real time</a> index.</li>
+<li><tt>async</tt> - Indicates that index is to be updated asynchronously. In such a case, this value is used to determine the <a href="#indexing-lane">indexing lane</a></li>
 <li>Any other value which ends in <tt>async</tt>.</li>
-  </ul></li>
-  
+</ul>
+</li>
 <li><tt>reindex</tt> - If set to <tt>true</tt>, reindexing is performed for that index. After reindexing is done, the property value is set to <tt>false</tt>. See <a href="#reindexing">reindexing</a> for more details.</li>
 </ol>
 <p>Based on the above two properties, the <tt>IndexUpdate</tt> creates an <tt>IndexEditor</tt> instances as it traverses the &#x201c;diff&#x201d;, and registers them with itself, passing on the callbacks for changes.</p>
@@ -412,57 +379,52 @@
 <p>Depending on the type of the index, one can create these index definitions under the root path (&#x2018;/&#x2019;), or non-root paths. Currently only <tt>lucene</tt> indexes support creating index definitions at non-root paths. <tt>property</tt> indexes can only be created under the root path, that is, under &#x2018;/&#x2019;.</p></div></div>
 <div class="section">
 <h3><a name="Synchronous_Indexing"></a><a name="sync-indexing"></a> Synchronous Indexing</h3>
-<p>Under synchronous indexing, the index content gets updates as part of the commit itself. Changes to both the main content, as well as the index content, are done atomically in a single commit. </p>
+<p>Under synchronous indexing, the index content gets updates as part of the commit itself. Changes to both the main content, as well as the index content, are done atomically in a single commit.</p>
 <p>This mode is currently supported by <tt>property</tt> and <tt>reference</tt> indexes.</p></div>
 <div class="section">
 <h3><a name="Asynchronous_Indexing"></a><a name="async-indexing"></a> Asynchronous Indexing</h3>
-<p>Asynchronous indexing (also called async indexing) is performed using periodic scheduled jobs. As part of the setup, Oak schedules certain periodic jobs which perform diff of the repository content, and update the index content based on that. </p>
+<p>Asynchronous indexing (also called async indexing) is performed using periodic scheduled jobs. As part of the setup, Oak schedules certain periodic jobs which perform diff of the repository content, and update the index content based on that.</p>
 <p>Each periodic <tt>AsyncIndexUpdate</tt> job is assigned to an <a href="#indexing-lane">indexing lane</a>, and is scheduled to run at a certain interval. At time of execution, the job performs its work:</p>
-
 <ol style="list-style-type: decimal">
-  
-<li>Look for the last indexed state via stored checkpoint data.  If such a checkpoint exists, then read the <tt>NodeState</tt> for that checkpoint.  If no such state exists, or no such checkpoint is present,  then it treats it as initial indexing, in which case the base state is empty.  This state is considered the <tt>before</tt> state.</li>
-  
-<li>Check if there has been any change in repository from the <tt>before</tt> state.  If no change is detected then current indexing cycle is considered completed and  <tt>IndexStatsMBean#done</tt> time is set to current time. <tt>LastIndexedTime</tt> is not updated</li>
-  
+
+<li>Look for the last indexed state via stored checkpoint data. If such a checkpoint exists, then read the <tt>NodeState</tt> for that checkpoint. If no such state exists, or no such checkpoint is present, then it treats it as initial indexing, in which case the base state is empty. This state is considered the <tt>before</tt> state.</li>
+<li>Check if there has been any change in repository from the <tt>before</tt> state. If no change is detected then current indexing cycle is considered completed and <tt>IndexStatsMBean#done</tt> time is set to current time. <tt>LastIndexedTime</tt> is not updated</li>
 <li>Create a checkpoint for <i>current</i> state and refer to this as <tt>after</tt> state.</li>
-  
-<li>Create an <tt>IndexUpdate</tt> instance bound to the current <i>indexing lane</i>,  and trigger a diff between the <tt>before</tt> and the <tt>after</tt> state.</li>
-  
-<li><tt>IndexUpdate</tt> will then pick up index definitions that are bound to the current indexing lane,  will create <tt>IndexEditor</tt> instances for them,  and pass them the diff callbacks.</li>
-  
-<li>The diff traverses in a depth-first manner,  and at the end of diff, the <tt>IndexEditor</tt> will do final changes for the current indexing run.  Depending on the index implementation, the index data can be either stored in the NodeStore itself  (for indexes of type <tt>lucene</tt>, <tt>property</tt>, and so on), or in any remote store (for type <tt>solr</tt>).</li>
-  
-<li><tt>AsyncIndexUpdate</tt> will then update the last indexed checkpoint to the current checkpoint  and do a commit.</li>
+<li>Create an <tt>IndexUpdate</tt> instance bound to the current <i>indexing lane</i>, and trigger a diff between the <tt>before</tt> and the <tt>after</tt> state.</li>
+<li><tt>IndexUpdate</tt> will then pick up index definitions that are bound to the current indexing lane, will create <tt>IndexEditor</tt> instances for them, and pass them the diff callbacks.</li>
+<li>The diff traverses in a depth-first manner, and at the end of diff, the <tt>IndexEditor</tt> will do final changes for the current indexing run. Depending on the index implementation, the index data can be either stored in the NodeStore itself (for indexes of type <tt>lucene</tt>, <tt>property</tt>, and so on), or in any remote store (for type <tt>solr</tt>).</li>
+<li><tt>AsyncIndexUpdate</tt> will then update the last indexed checkpoint to the current checkpoint and do a commit.</li>
 </ol>
 <p>Such async indexes are <i>eventually consistent</i> with the repository state, and lag behind the latest repository state by some time. However, the index content is eventually consistent, and never ends up in wrong state with respect to repository state.</p>
 <div class="section">
 <h4><a name="Checkpoint"></a><a name="checkpoint"></a> Checkpoint</h4>
-<p>A checkpoint is a mechanism, whereby a client of the <tt>NodeStore</tt> can request Oak to ensure that the repository state (snapshot) at that time can be preserved, and not removed by the revision garbage collection process. Later, that state can be retrieved from the NodeStore by passing the checkpoint. You can think of a checkpoint as a tag in a git repository, or as a named revision. </p>
-<p>Async indexing makes use of checkpoint support to access older repository state. </p></div>
+<p>A checkpoint is a mechanism, whereby a client of the <tt>NodeStore</tt> can request Oak to ensure that the repository state (snapshot) at that time can be preserved, and not removed by the revision garbage collection process. Later, that state can be retrieved from the NodeStore by passing the checkpoint. You can think of a checkpoint as a tag in a git repository, or as a named revision.</p>
+<p>Async indexing makes use of checkpoint support to access older repository state.</p></div>
 <div class="section">
 <h4><a name="Indexing_Lane"></a><a name="indexing-lane"></a> Indexing Lane</h4>
 <p>The term &#x201c;indexing lane&#x201d; refers to a set of indexes which are to be updated by a given async indexer. Each index definition meant for async indexing defines an <tt>async</tt> property, whose value is the name of the indexing lane. For example, consider following two index definitions:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/userIndex
+<div>
+<div>
+<pre class="source">/oak:index/userIndex
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - async = &quot;async&quot;
-
+  
 /oak:index/assetIndex
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - async = &quot;fulltext-async&quot;
 </pre></div></div>
-<p>Here, <i>userIndex</i> is bound to the &#x201c;async&#x201d; indexing lane, while <i>assetIndex</i> is bound to the &#x201c;fulltext-async&#x201d; lane. Oak <a href="#async-index-setup">setup</a> configures two <tt>AsyncIndexUpdate</tt> jobs: one for &#x201c;async&#x201d;, and one for &#x201c;fulltext-async&#x201d;. When the job for &#x201c;async&#x201d; is run, it only processes index definition where the <tt>async</tt> value is <tt>async</tt>, while when the job for &#x201c;fulltext-async&#x201d; is run, it only pick up index definitions where the <tt>async</tt> value is <tt>fulltext-async</tt>.</p>
+
+<p>Here, <i>userIndex</i> is bound to the &#x201c;async&#x201d; indexing lane, while <i>assetIndex</i> is bound to  the &#x201c;fulltext-async&#x201d; lane. Oak <a href="#async-index-setup">setup</a> configures two <tt>AsyncIndexUpdate</tt> jobs: one for &#x201c;async&#x201d;, and one for &#x201c;fulltext-async&#x201d;. When the job for &#x201c;async&#x201d; is run, it only processes index definition where the <tt>async</tt> value is <tt>async</tt>, while when the job for &#x201c;fulltext-async&#x201d; is run, it only pick up index definitions where the <tt>async</tt> value is <tt>fulltext-async</tt>.</p>
 <p>These jobs can be scheduled to run at different intervals, and also on different cluster nodes. Each job keeps its own bookkeeping of checkpoint state, and can be <a href="#async-index-mbean">paused and resumed</a> separately.</p>
-<p>Prior to Oak 1.4, there was only one indexing lane: <tt>async</tt>. In Oak 1.4, support was added to create two lanes: <tt>async</tt> and <tt>fulltext-async</tt>. With 1.6, it is possible to <a href="#async-index-setup">create multiple lanes</a>. </p></div>
+<p>Prior to Oak 1.4, there was only one indexing lane: <tt>async</tt>. In Oak 1.4, support was added to create two lanes: <tt>async</tt> and <tt>fulltext-async</tt>. With 1.6, it is possible to <a href="#async-index-setup">create multiple lanes</a>.</p></div>
 <div class="section">
 <h4><a name="Clustered_Setup"></a><a name="cluster"></a> Clustered Setup</h4>
 <p>In a clustered setup, one needs to ensure in the host application that the async indexing jobs for all lanes are run as singleton in the cluster. If <tt>AsyncIndexUpdate</tt> for the same lane is executed concurrently on different cluster nodes, it leads to race conditions, where an old checkpoint gets lost, leading to reindexing.</p>
 <p>See also <a href="../clustering.html#scheduled-jobs">clustering</a> for more details on how the host application should schedule such indexing jobs.</p>
 <div class="section">
 <h5><a name="Indexing_Lease"></a><a name="async-index-lease"></a> Indexing Lease</h5>
-<p><tt>AsyncIndexUpdate</tt> has an in-built &#x201c;lease&#x201d; logic to ensure that even if the jobs gets scheduled to run on different cluster nodes, only one of them runs. This is done by keeping a lease property, which gets periodically updated as indexing progresses. </p>
+<p><tt>AsyncIndexUpdate</tt> has an in-built &#x201c;lease&#x201d; logic to ensure that even if the jobs gets scheduled to run on different cluster nodes, only one of them runs. This is done by keeping a lease property, which gets periodically updated as indexing progresses.</p>
 <p>An <tt>AsyncIndexUpdate</tt> run skips indexing if the current lease has not expired. If the last update of the lease was done too long ago (default: more than 15 minutes), it is assumed that cluster node that is supposed to index is not available, and some other node will take over.</p>
 <p>The lease logic can delay the start of indexing if the system is not stopped cleanly. As of Oak 1.6, this does not affect non-clustered setups like those based on SegmentNodeStore, but only <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-5159">affects DocumentNodeStore</a> based setups.</p></div></div>
 <div class="section">
@@ -480,61 +442,61 @@
 <h4><a name="Async_Indexing_MBean"></a><a name="async-index-mbean"></a> Async Indexing MBean</h4>
 <p>For each configured async indexer in the setup, the indexer exposes a <tt>IndexStatsMBean</tt>, which provides various stats around the current indexing state:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">org.apache.jackrabbit.oak: async (IndexStats)
+<div>
+<div>
+<pre class="source">org.apache.jackrabbit.oak: async (IndexStats)
 org.apache.jackrabbit.oak: fulltext-async (IndexStats)
 </pre></div></div>
-<p>It provide the following details:</p>
 
+<p>It provide the following details:</p>
 <ul>
-  
+
 <li>FailingIndexStats - Stats around indexes which are <a href="#corrupt-index-handling">failing and marked as corrupt</a>.</li>
-  
 <li>LastIndexedTime - Time up to which the repository state has been indexed.</li>
-  
 <li>Status - running, done, failing etc.</li>
-  
-<li>Failing - boolean flag indicating that indexing has been failing due to some issue.  This can be monitored for detecting if indexer is healthy or not.</li>
-  
+<li>Failing - boolean flag indicating that indexing has been failing due to some issue. This can be monitored for detecting if indexer is healthy or not.</li>
 <li>ExecutionCount - Time series data around the number of runs for various time intervals.</li>
 </ul>
 <p>Further it provides the following operations:</p>
-
 <ul>
-  
+
 <li>pause - Pauses the indexer.</li>
-  
-<li>abortAndPause - Aborts any running indexing cycle and pauses the indexer.  Invoke &#x2018;resume&#x2019; once you are ready to resume indexing again.</li>
-  
+<li>abortAndPause - Aborts any running indexing cycle and pauses the indexer. Invoke &#x2018;resume&#x2019; once you are ready to resume indexing again.</li>
 <li>resume - Resume indexing.</li>
 </ul></div>
 <div class="section">
 <h4><a name="Isolating_Corrupt_Indexes"></a><a name="corrupt-index-handling"></a> Isolating Corrupt Indexes</h4>
 <p><tt>Since 1.6</tt></p>
-<p>The <tt>AsyncIndexerService</tt> marks any index which fails to update for 30 minutes (configurable) as <tt>corrupt</tt>, and ignore such indexes from further indexing. </p>
+<p>The <tt>AsyncIndexerService</tt> marks any index which fails to update for 30 minutes (configurable) as <tt>corrupt</tt>, and ignore such indexes from further indexing.</p>
 <p>When any index is marked as corrupt, the following log entry is made:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">2016-11-22 12:52:35,484 INFO  NA [async-index-update-fulltext-async] o.a.j.o.p.i.AsyncIndexUpdate - 
+<div>
+<div>
+<pre class="source">2016-11-22 12:52:35,484 INFO  NA [async-index-update-fulltext-async] o.a.j.o.p.i.AsyncIndexUpdate - 
 Marking [/oak:index/lucene] as corrupt. The index is failing since Tue Nov 22 12:51:25 IST 2016, 
 1 indexing cycles, failed 7 times, skipped 0 time 
 </pre></div></div>
+
 <p>Post this, when any new content gets indexed and any such corrupt index is skipped, the following warn entry is made:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">2016-11-22 12:52:35,485 WARN  NA [async-index-update-fulltext-async] o.a.j.o.p.index.IndexUpdate - 
+<div>
+<div>
+<pre class="source">2016-11-22 12:52:35,485 WARN  NA [async-index-update-fulltext-async] o.a.j.o.p.index.IndexUpdate - 
 Ignoring corrupt index [/oak:index/lucene] which has been marked as corrupt since 
 [2016-11-22T12:51:25.492+05:30]. This index MUST be reindexed for indexing to work properly 
 </pre></div></div>
+
 <p>This info is also seen in the MBean</p>
 <p><img src="corrupt-index-mbean.png" alt="Corrupt Index stats in IndexStatsMBean" /></p>
 <p>Later, once the index is reindexed, the following log entry is made</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">2016-11-22 12:56:25,486 INFO  NA [async-index-update-fulltext-async] o.a.j.o.p.index.IndexUpdate - 
+<div>
+<div>
+<pre class="source">2016-11-22 12:56:25,486 INFO  NA [async-index-update-fulltext-async] o.a.j.o.p.index.IndexUpdate - 
 Removing corrupt flag from index [/oak:index/lucene] which has been marked as corrupt since 
 [corrupt = 2016-11-22T12:51:25.492+05:30] 
 </pre></div></div>
+
 <p>This feature can be disabled by setting <tt>failingIndexTimeoutSeconds</tt> to 0 in the <tt>AsyncIndexService</tt> config. See also <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4939">OAK-4939</a> for more details.</p></div></div>
 <div class="section">
 <h3><a name="Near_Real_Time_Indexing"></a><a name="nrt-indexing"></a> Near Real Time Indexing</h3>
@@ -543,77 +505,73 @@ Removing corrupt flag from index [/oak:i
 <p>Lucene indexes perform well for evaluating complex queries, and have the benefit of being evaluated locally with copy-on-read support. However, they are <tt>async</tt>, and depending on system load can lag behind the repository state. For cases where such lag (which can be in the order of minutes) is not acceptable, one must use <tt>property</tt> indexes. To avoid that, Oak 1.6 has <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4412">added support for near real time indexing</a></p>
 <p><img src="index-nrt.png" alt="NRT Index Flow" /></p>
 <p>In this mode, the indexing happen in two modes, and a query will consult multiple indexes. The diagram above shows the indexing flow with time. In the above flow:</p>
-
 <ul>
-  
+
 <li>T1, T3 and T5 - Time instances at which checkpoints are created.</li>
-  
 <li>T2 and T4 - Time instance when async indexer runs completed and indexes were updated.</li>
-  
 <li>Persisted Index:
-  
 <ul>
-    
+
 <li>v2 - Index version v2, which has repository state indexed up to T1.</li>
-    
 <li>v3 - Index version v2, which has repository state indexed up to T3.</li>
-  </ul></li>
-  
+</ul>
+</li>
 <li>Local Index:
-  
 <ul>
-    
+
 <li>NRT1 - Local index, which has repository state indexed between T2 and T4.</li>
-    
 <li>NRT2 - Local index, which has repository state indexed between T4 and T6.</li>
-  </ul></li>
+</ul>
+</li>
 </ul>
 <p>As the repository state changes with time, the Async indexer will run and index the changes between the last known checkpoint and current state when that run started. So when async run 1 completed, the persisted index has the repository state indexed up to T3.</p>
-<p>Now without NRT index support, if any query is performed between T2 and T4, it can only see index results for the repository state at T1, as that is the state where the persisted indexes have data for. Any change after that cannot be seen until the next async indexing cycle is complete (at T4). </p>
+<p>Now without NRT index support, if any query is performed between T2 and T4, it can only see index results for the repository state at T1, as that is the state where the persisted indexes have data for. Any change after that cannot be seen until the next async indexing cycle is complete (at T4).</p>
 <p>With NRT indexing support, indexing will happen at two places:</p>
-
 <ul>
-  
-<li>Persisted Index - This is the index which is updated via the async indexer run.  This flow remains the same, it will be periodically updated by the indexer run.</li>
-  
-<li>Local Index - In addition to persisted index, each cluster node will also maintain a local index.  This index only keeps data between two async indexer runs.  Post each run, the previous index is discarded, and a new index is built  (actually, the previous index is retained for one cycle).</li>
+
+<li>Persisted Index - This is the index which is updated via the async indexer run. This flow remains the same, it will be periodically updated by the indexer run.</li>
+<li>Local Index - In addition to persisted index, each cluster node will also maintain a local index. This index only keeps data between two async indexer runs. Post each run, the previous index is discarded, and a new index is built (actually, the previous index is retained for one cycle).</li>
 </ul>
 <p>Any query making use of such an index will automatically make use of both the persisted and the local indexes. With this, new content added in the repository after the last async index run will also show up quickly.</p>
 <div class="section">
 <h4><a name="Usage"></a><a name="nrt-indexing-usage"></a> Usage</h4>
 <p>NRT (Near real time) indexing can be enabled for an index by configuring the <tt>async</tt> property:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/assetIndex
+<div>
+<div>
+<pre class="source">/oak:index/assetIndex
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - async = ['fulltext-async', 'nrt']
 </pre></div></div>
-<p>Here, <tt>async</tt> has been set to a multi-valued property, with the</p>
 
+<p>Here, <tt>async</tt> has been set to a multi-valued property, with the</p>
 <ul>
-  
+
 <li>Indexing lane - For example <tt>async</tt> or <tt>fulltext-async</tt>,</li>
-  
 <li>NRT Indexing Mode - <tt>nrt</tt> or <tt>sync</tt>.</li>
 </ul>
 <div class="section">
 <h5><a name="NRT_Indexing_Mode_-_nrt"></a><a name="nrt-indexing-mode-nrt"></a> NRT Indexing Mode - nrt</h5>
 <p>In this mode, the local index is updated asynchronously on that cluster nodes post each commit, and the index reader is refreshed each second. So, any change done should show up on that cluster node within 1 to 2 seconds.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/userIndex
+<div>
+<div>
+<pre class="source">/oak:index/userIndex
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - async = ['async', 'nrt']
-</pre></div></div></div>
+</pre></div></div>
+</div>
 <div class="section">
 <h5><a name="NRT_Indexing_Mode_-_sync"></a><a name="nrt-indexing-mode-sync"></a> NRT Indexing Mode - sync</h5>
 <p>In this mode, the local index is updated synchronously on that cluster nodes post each commit, and the index reader is refreshed immediately. This mode indexes more slowly compared to the &#x201c;nrt&#x201d; mode.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/userIndex
+<div>
+<div>
+<pre class="source">/oak:index/userIndex
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - async = ['async', 'sync']
 </pre></div></div>
+
 <p>For a single node setup (for example with the <tt>SegmentNodeStore</tt>), this mode effectively makes async lucene index perform same as synchronous property indexes. However, the &#x2018;nrt&#x2019; mode performs better, so using that is preferable.</p></div></div>
 <div class="section">
 <h4><a name="Cluster_Setup"></a><a name="nrt-indexing-cluster-setup"></a> Cluster Setup</h4>
@@ -621,52 +579,45 @@ Removing corrupt flag from index [/oak:i
 <div class="section">
 <h4><a name="Configuration"></a><a name="nrt-indexing-config"></a> Configuration</h4>
 <p>NRT indexing expose a few configuration options as part of the <a href="lucene.html#osgi-config">LuceneIndexProviderService</a>:</p>
-
 <ul>
-  
-<li><tt>enableHybridIndexing</tt> - Boolean property, defaults to <tt>true</tt>.  Can be set to <tt>false</tt> to disable the NRT indexing feature completely.</li>
-  
-<li><tt>hybridQueueSize</tt> - The size of the in-memory queue used  to hold Lucene documents for indexing in the <tt>nrt</tt> mode.  The default size is 10000.</li>
+
+<li><tt>enableHybridIndexing</tt> - Boolean property, defaults to <tt>true</tt>. Can be set to <tt>false</tt> to disable the NRT indexing feature completely.</li>
+<li><tt>hybridQueueSize</tt> - The size of the in-memory queue used to hold Lucene documents for indexing in the <tt>nrt</tt> mode. The default size is 10000.</li>
 </ul></div></div></div>
 <div class="section">
 <h2><a name="Reindexing"></a><a name="reindexing"></a> Reindexing</h2>
-<p>Reindexing rarely solves problems. Specially, it does not typically make queries return the expected result. For such cases, it is <i>not</i> recommended to reindex, also because reindex can be very slow (sometimes multiple days), and use a lot of temporary disk space. Note that removing checkpoints, and removing the hidden <tt>:async</tt> node will cause a full reindex, so doing this is not recommended either. If queries don&#x2019;t return the right data, then possibly the index is <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-5159">not yet up-to-date</a>, or the query is incorrect, or included/excluded path settings are wrong (for Lucene indexes). Instead of reindexing, it is suggested to first check the log file, modify the query so it uses a different index or traversal, and run the query again.</p>
+<p>Reindexing rarely solves problems. Specially, it does not typically make queries return the expected result. For such cases, it is <i>not</i> recommended to reindex, also because reindex can be very slow (sometimes multiple days), and use a lot of temporary disk space. Note that removing checkpoints, and removing the hidden <tt>:async</tt> node will  cause a full reindex, so doing this is not recommended either. If queries don&#x2019;t return the right data, then possibly the index is <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-5159">not yet up-to-date</a>, or the query is incorrect, or included/excluded path settings are wrong (for Lucene indexes). Instead of reindexing, it is suggested to first check the log file, modify the query so it uses a different index or traversal, and run the query again.</p>
 <p>Reindexing of existing indexes is required in the following scenarios:</p>
-
 <ul>
-  
-<li>A: In case a <i>property</i> index configuration was changed,  such that the index is used for queries, but doesn&#x2019;t contain some of the nodes.  Nodes that existed <i>before</i> the index configuration was changed, are not indexed.  A workaround is to change (&#x2018;touch&#x2019;) the affected nodes.</li>
-  
-<li>B: Prior to Oak 1.6, in case a <i>Lucene</i> index definition was changed (same as A).  In Oak 1.6 and newer, queries will use the old index definition  until the index is <a href="lucene.html#stored-index-definition">reindexed</a>.</li>
-  
-<li>C: Prior to Oak 1.2.15 / 1.4.2, in case the query engine picks a very slow index  for some queries because the counter index (<tt>/oak:index/counter</tt>)  <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4065">got out of sync after adding and removing lots of nodes many times</a>.  For this case, it is recommended to verify the contents of the counter index first,  and upgrade Oak before reindexing.  To view the content, use the NodeCounter JMX bean and  run <tt>getEstimatedChildNodeCounts</tt> with p1 = <tt>/</tt> and p2 = <tt>2</tt>.  If there is a problem, then the estimated node count of root node typically is very low,  more than 10 times lower than the sum of its children.  Only the <tt>counter</tt> index needs to be reindexed in this case.  The workaround (to avoid reindexing) is to manually tweak index configurations  using manually set <tt>entryCount</tt> of the index that should be used to a low value  (as high as possible so that the index is s
 till needed), for example to 100 or 1000.</li>
-  
-<li>D: In case a binary of a Lucene index (a Lucene index file) is missing,  for example because the binary is not available in the datastore.  This can happen in case the datastore is misconfigured  such that garbage collection removed a binary that is still required.  In such cases, other binaries might be missing as well;  it is best to traverse all nodes of the repository to ensure this is not the case.</li>
-  
-<li>E: In case a binary of a Lucene index (a Lucene index file) is corrupt.  If the index is corrupt, an <tt>AsyncIndexUpdate</tt> run will fail  with an exception saying a Lucene index file is corrupt.  In such a case, first verify that the following procedure doesn&#x2019;t resolve  the issue: stop Oak, remove the local copy of the Lucene index (directory <tt>index</tt>),  and restart. If the index is still corrupt after this, then reindexing is needed.  In such cases, please file an Oak issue.</li>
-  
-<li>F: Prior to Oak 1.2.24 / 1.4.13 / 1.6.1,  when using the document store (MongoDB or RDBMK)  in combination with a large transaction (a commit that changed or added many thousand nodes),  and if one of the parent nodes had more than 100 child nodes,  then indexes (all types) <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-5557">did not see those changes in some cases</a>.</li>
-  
-<li>G: Prior to Oak 1.4.7, when repository sidegrade was used to do <i>partial</i> migrations,  that is migrating data without migrating related indexes.  In this case, the property indexes need to be either fully rebuilt,  or (as an alternative) copy or migrate the content again using a newer version of Oak.  See also <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4684">OAK-4684</a>.</li>
-  
-<li>H: If a binary is missing after reindexing.  This can happen in the following case:  When reindexing or creating a new index takes multiple days,  and during that time, after one day or later, datastore garbage collection was run concurrently.  Some binaries created during by reindexing can get missing because  datastore garbage collection removes unreferenced binaries older than one day.  Indexing or reindexing using oak-run is not affected by this.</li>
-  
-<li>I: Prior to Oak 1.0.27 / 1.2.11,  if an index file gets larger than 2 GB, then possibly the index can not be opened  (exception &#x201c;Invalid seek request&#x201d;), and subsequently the index might get corrupt.  See also <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3911">OAK-3911</a>.</li>
+
+<li>A: In case a <i>property</i> index configuration was changed, such that the index is used for queries, but doesn&#x2019;t contain some of the nodes. Nodes that existed <i>before</i> the index configuration was changed, are not indexed. A workaround is to change (&#x2018;touch&#x2019;) the affected nodes.</li>
+<li>B: Prior to Oak 1.6, in case a <i>Lucene</i> index definition was changed (same as A). In Oak 1.6 and newer, queries will use the old index definition until the index is <a href="lucene.html#stored-index-definition">reindexed</a>.</li>
+<li>C: Prior to Oak 1.2.15 / 1.4.2, in case the query engine picks a very slow index for some queries because the counter index (<tt>/oak:index/counter</tt>) <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4065">got out of sync after adding and removing lots of nodes many times</a>. For this case, it is recommended to verify the contents of the counter index first, and upgrade Oak before reindexing. To view the content, use the NodeCounter JMX bean and run <tt>getEstimatedChildNodeCounts</tt> with p1 = <tt>/</tt> and p2 = <tt>2</tt>. If there is a problem, then the estimated node count of root node typically is very low, more than 10 times lower than the sum of its children. Only the <tt>counter</tt> index needs to be reindexed in this case. The workaround (to avoid reindexing) is to manually tweak index configurations using manually set <tt>entryCount</tt> of the index that should be used to a low value (as high as possible so that the index is still needed)
 , for example to 100 or 1000.</li>
+<li>D: In case a binary of a Lucene index (a Lucene index file) is missing, for example because the binary is not available in the datastore. This can happen in case the datastore is misconfigured such that garbage collection removed a binary that is still required. In such cases, other binaries might be missing as well; it is best to traverse all nodes of the repository to ensure this is not the case.</li>
+<li>E: In case a binary of a Lucene index (a Lucene index file) is corrupt. If the index is corrupt, an <tt>AsyncIndexUpdate</tt> run will fail with an exception saying a Lucene index file is corrupt. In such a case, first verify that the following procedure doesn&#x2019;t resolve the issue: stop Oak, remove the local copy of the Lucene index (directory <tt>index</tt>), and restart. If the index is still corrupt after this, then reindexing is needed. In such cases, please file an Oak issue.</li>
+<li>F: Prior to Oak 1.2.24 / 1.4.13 / 1.6.1, when using the document store (MongoDB or RDBMK) in combination with a large transaction (a commit that changed or added many thousand nodes), and if one of the parent nodes had more than 100 child nodes, then indexes (all types) <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-5557">did not see those changes in some cases</a>.</li>
+<li>G: Prior to Oak 1.4.7, when repository sidegrade was used to do <i>partial</i> migrations, that is migrating data without migrating related indexes. In this case, the property indexes need to be either fully rebuilt, or (as an alternative) copy or migrate the content again using a newer version of Oak. See also <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4684">OAK-4684</a>.</li>
+<li>H: If a binary is missing after reindexing. This can happen in the following case: When reindexing or creating a new index takes multiple days, and during that time, after one day or later, datastore garbage collection was run concurrently. Some binaries created during by reindexing can get missing because datastore garbage collection removes unreferenced binaries older than one day. Indexing or reindexing using oak-run is not affected by this.</li>
+<li>I: Prior to Oak 1.0.27 / 1.2.11, if an index file gets larger than 2 GB, then possibly the index can not be opened (exception &#x201c;Invalid seek request&#x201d;), and subsequently the index might get corrupt. See also <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3911">OAK-3911</a>.</li>
 </ul>
 <p>New indexes are built automatically once the index definition is stored. To reindex an <i>existing</i> index (when needed), set the <tt>reindex</tt> property to <tt>true</tt> in the respective index definition:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/userIndex
+<div>
+<div>
+<pre class="source">/oak:index/userIndex
   - reindex = true
 </pre></div></div>
+
 <p>Once changes are saved, the index is reindexed. For asynchronous indexes, reindex starts with the next async indexing cycle. For synchronous indexes, the reindexing is done as part of save (or commit) itself. For a (synchronous) property index, as an alternative you can use the <tt>PropertyIndexAsyncReindexMBean</tt>; see the <a href="property-index.html#reindexing">reindeinxing property indexes</a> section for more details on that.</p>
 <p>Once reindexing starts, the following log entries can be seen in the log:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">[async-index-update-async] o.a.j.o.p.i.IndexUpdate Reindexing will be performed for following indexes: [/oak:index/userIndex]
+<div>
+<div>
+<pre class="source">[async-index-update-async] o.a.j.o.p.i.IndexUpdate Reindexing will be performed for following indexes: [/oak:index/userIndex]
 [async-index-update-async] o.a.j.o.p.i.IndexUpdate Reindexing Traversed #100000 /home/user/admin 
 [async-index-update-async] o.a.j.o.p.i.AsyncIndexUpdate [async] Reindexing completed for indexes: [/oak:index/userIndex*(4407016)] in 30 min 
 </pre></div></div>
+
 <p>Once reindexing is complete, the <tt>reindex</tt> flag is set to <tt>false</tt> automatically.</p>
 <div class="section">
 <h3><a name="Reducing_Reindexing_Times"></a><a name="reduce-reindexing-times"></a> Reducing Reindexing Times</h3>

Modified: jackrabbit/site/live/oak/docs/query/lucene-old.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/lucene-old.html?rev=1835390&r1=1835389&r2=1835390&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/lucene-old.html (original)
+++ jackrabbit/site/live/oak/docs/query/lucene-old.html Mon Jul  9 08:53:17 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-05-24 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-07-09 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180524" />
+    <meta name="Date-Revision-yyyymmdd" content="20180709" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Lucene Index</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -136,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-05-24<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-07-09<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -240,46 +240,45 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><div class="section">
+  -->
+<div class="section">
 <h2><a name="Lucene_Index"></a>Lucene Index</h2>
-<p><b>Following details are applicable for Oak release 1.0.8 and earlier. For current documentation refer to <a href="lucene.html">Current Lucene documentation</a></b></p>
+<p><b>Following details are applicable for Oak release 1.0.8 and earlier. For current documentation  refer to <a href="lucene.html">Current Lucene documentation</a></b></p>
 <p>Oak supports Lucene based indexes to support both property constraint and full text constraints</p>
 <div class="section">
 <h3><a name="The_Lucene_Full-Text_Index"></a>The Lucene Full-Text Index</h3>
 <p>The full-text index handles the &#x2018;contains&#x2019; type of queries:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">//*[jcr:contains(., 'text')]
+<div>
+<div>
+<pre class="source">//*[jcr:contains(., 'text')]
 </pre></div></div>
+
 <p>If a full-text index is configured, then all queries that have a full-text condition use the full-text index, no matter if there are other conditions that are indexed, and no matter if there is a path restriction.</p>
 <p>If no full-text index is configured, then queries with full-text conditions may not work as expected. (The query engine has a basic verification in place for full-text conditions, but it does not support all features that Lucene does, and it traverses all nodes if there are no indexed constraints).</p>
-<p>The full-text index update is asynchronous via a background thread, see <tt>Oak#withAsyncIndexing</tt>. This means that some full-text searches will not work for a small window of time: the background thread runs every 5 seconds, plus the time is takes to run the diff and to run the text-extraction process. </p>
+<p>The full-text index update is asynchronous via a background thread, see <tt>Oak#withAsyncIndexing</tt>. This means that some full-text searches will not work for a small window of time: the background thread runs every 5 seconds, plus the time is takes to run the diff and to run the text-extraction process.</p>
 <p>The async update status is now reflected on the <tt>oak:index</tt> node with the help of a few properties, see <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-980">OAK-980</a></p>
 <p>TODO Node aggregation <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-828">OAK-828</a></p>
 <p>The index definition node for a lucene-based full-text index:</p>
-
 <ul>
-  
+
 <li>must be of type <tt>oak:QueryIndexDefinition</tt></li>
-  
 <li>must have the <tt>type</tt> property set to <b><tt>lucene</tt></b></li>
-  
 <li>must contain the <tt>async</tt> property set to the value <tt>async</tt>, this is what sends the index update process to a background thread</li>
 </ul>
 <p><i>Optionally</i> you can add</p>
-
 <ul>
-  
-<li>what subset of property types to be included in the index via the<br /> <tt>includePropertyTypes</tt> property</li>
-  
-<li>a blacklist of property names: what property to be excluded from the index  via the <tt>excludePropertyNames</tt> property</li>
-  
+
+<li>what subset of property types to be included in the index via the<br />
+<tt>includePropertyTypes</tt> property</li>
+<li>a blacklist of property names: what property to be excluded from the index via the <tt>excludePropertyNames</tt> property</li>
 <li>the <tt>reindex</tt> flag which when set to <tt>true</tt>, triggers a full content re-index.</li>
 </ul>
 <p>Example:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">{
+<div>
+<div>
+<pre class="source">{
   NodeBuilder index = root.child(&quot;oak:index&quot;);
   index.child(&quot;lucene&quot;)
     .setProperty(&quot;jcr:primaryType&quot;, &quot;oak:QueryIndexDefinition&quot;, Type.NAME)
@@ -292,18 +291,23 @@
     .setProperty(&quot;reindex&quot;, true);
 }
 </pre></div></div>
-<p><b>Note</b> The Oak Lucene index will only index <i>Strings</i> and <i>Binaries</i> by default. If you need to add another data type, you need to add it to the<br /><i>includePropertyTypes</i> setting, and don&#x2019;t forget to set the <i>reindex</i> flag to true.</p></div>
+
+<p><b>Note</b> The Oak Lucene index will only index <i>Strings</i> and <i>Binaries</i> by default. If you need to add another data type, you need to add it to the<br />
+<i>includePropertyTypes</i> setting, and don&#x2019;t forget to set the <i>reindex</i> flag to true.</p></div>
 <div class="section">
 <h3><a name="Lucene_Property_Index_Since_1.0.8"></a>Lucene Property Index (Since 1.0.8)</h3>
 <p>Oak uses Lucene for creating index to support queries which involve property constraint that is not full-text</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">select * from [nt:base] where [alias] = '/admin'
+<div>
+<div>
+<pre class="source">select * from [nt:base] where [alias] = '/admin'
 </pre></div></div>
-<p>To define a property index on a subtree for above query you have to add an index definition </p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">&quot;uuid&quot; : {
+<p>To define a property index on a subtree for above query you have to add an index definition</p>
+
+<div>
+<div>
+<pre class="source">&quot;uuid&quot; : {
         &quot;jcr:primaryType&quot;: &quot;oak:QueryIndexDefinition&quot;,
         &quot;type&quot;: &quot;lucene&quot;,
         &quot;async&quot;: &quot;async&quot;,
@@ -311,25 +315,22 @@
         &quot;includePropertyNames&quot;: [&quot;alias&quot;]
     }
 </pre></div></div>
-<p>The index definition node for a lucene-based full-text index:</p>
 
+<p>The index definition node for a lucene-based full-text index:</p>
 <ul>
-  
+
 <li>must be of type <tt>oak:QueryIndexDefinition</tt></li>
-  
 <li>must have the <tt>type</tt> property set to <b><tt>lucene</tt></b></li>
-  
 <li>must contain the <tt>async</tt> property set to the value <tt>async</tt>, this is what sends the index update process to a background thread</li>
-  
 <li>must have <tt>fulltextEnabled</tt> set to <tt>false</tt></li>
-  
 <li>must provide a whitelist of property names which should be indexed via <tt>includePropertyNames</tt></li>
 </ul>
 <p><i>Note that compared to <a href="query.html#property-index">Property Index</a> Lucene Property Index is always configured in Async mode hence it might lag behind in reflecting the current repository state while performing the query</i></p>
-<p>Taking another example. </p>
+<p>Taking another example.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">select
+<div>
+<div>
+<pre class="source">select
     *
 from
     [app:Asset] as a
@@ -339,10 +340,12 @@ where
 order by
     jcr:content/jcr:lastModified
 </pre></div></div>
-<p>To enable faster execution for above query you can create following Lucene property index </p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">&quot;assetIndex&quot;:
+<p>To enable faster execution for above query you can create following Lucene property index</p>
+
+<div>
+<div>
+<pre class="source">&quot;assetIndex&quot;:
 {
   &quot;jcr:primaryType&quot;:&quot;oak:QueryIndexDefinition&quot;,
   &quot;declaringNodeTypes&quot;:&quot;app:Asset&quot;,
@@ -365,22 +368,20 @@ order by
   }	
 }
 </pre></div></div>
-<p>Above index definition makes use of various features supported by property index</p>
 
+<p>Above index definition makes use of various features supported by property index</p>
 <ul>
-  
+
 <li><tt>declaringNodeTypes</tt> - As the query involves nodes of type <tt>app:Asset</tt> index is restricted to only index nodes of type <tt>app:Asset</tt></li>
-  
 <li><tt>orderedProps</tt> - As the query performs sorting via <tt>order by</tt> clause index is configured with property names which are used in sorting</li>
-  
 <li><tt>properties</tt> - For ordering to work properly we need to tell the type of property</li>
 </ul>
 <p>For implementation details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2005">OAK-2005</a>. Following sections would provide more details about supported features</p></div>
 <div class="section">
 <h3><a name="Index_Definition"></a>Index Definition</h3>
 <p>Lucene index definition is managed via <tt>NodeStore</tt> and supports following attributes</p>
-
 <dl>
+
 <dt>type</dt>
 <dd>Required and should always be <tt>lucene</tt></dd>
 <dt>async</dt>
@@ -388,15 +389,15 @@ order by
 <dt>fulltextEnabled</dt>
 <dd>For Lucene based property index this should <i>always</i> be set to <tt>false</tt></dd>
 <dt>declaringNodeTypes</dt>
-<dd>Node type names whose properties should be indexed. If not specified then all  nodes would indexed if they have properties defined in <tt>includePropertyNames</tt>.  For smaller and efficient indexes its recommended that <tt>declaringNodeTypes</tt>  should be specified according to your query needs</dd>
+<dd>Node type names whose properties should be indexed. If not specified then all nodes would indexed if they have properties defined in <tt>includePropertyNames</tt>. For smaller and efficient indexes its recommended that <tt>declaringNodeTypes</tt> should be specified according to your query needs</dd>
 <dt>includePropertyNames</dt>
-<dd>List of property name which should be indexed. Property name can be  relative e.g. <tt>jcr:content/jcr:lastModified</tt></dd>
+<dd>List of property name which should be indexed. Property name can be relative e.g. <tt>jcr:content/jcr:lastModified</tt></dd>
 <dt>orderedProps</dt>
-<dd>List of property names which would be used in the <tt>order by</tt> clause of the  query</dd>
+<dd>List of property names which would be used in the <tt>order by</tt> clause of the query</dd>
 <dt>includePropertyTypes</dt>
 <dd>Used in Lucene Fulltext Index</dd>
 <dd>For full text index defaults to <tt>String, Binary</tt></dd>
-<dd>List of property types which should be indexed. The values can be one  specified in <a class="externalLink" href="http://www.day.com/specs/jsr170/javadocs/jcr-2.0/constant-values.html#javax.jcr.PropertyType.TYPENAME_STRING">PropertyType Names</a></dd>
+<dd>List of property types which should be indexed. The values can be one specified in <a class="externalLink" href="http://www.day.com/specs/jsr170/javadocs/jcr-2.0/constant-values.html#javax.jcr.PropertyType.TYPENAME_STRING">PropertyType Names</a></dd>
 <dt><a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2201">blobSize</a></dt>
 <dd>Default value 32768 (32kb)</dd>
 <dd>Size in bytes used for splitting the index files when storing them in NodeStore</dd>
@@ -408,8 +409,9 @@ order by
 <p>In some cases property specific configurations are required. For example typically while performing order by in query user does not specify the property type. In such cases you need to specify the property type explicitly.</p>
 <p>Property definition nodes are created as per there property name under <tt>properties</tt> node of index definition node. For relative properties you would need to create the required path structure under <tt>properties</tt> node. For e.g. for property <tt>jcr:content/metadata/format</tt> you need to create property node at path <tt>&lt;index definition node&gt;/properties/jcr:content/jcr:lastModified</tt></p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">&quot;properties&quot;:
+<div>
+<div>
+<pre class="source">&quot;properties&quot;:
   {
     &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
     &quot;jcr:content&quot;:
@@ -425,6 +427,7 @@ order by
 </pre></div></div>
 
 <dl>
+
 <dt>type</dt>
 <dd>JCR Property type. Can be one of <tt>Date</tt>, <tt>Boolean</tt>, <tt>Double</tt> or <tt>Long</tt></dd>
 <dt>boost</dt>
@@ -435,15 +438,14 @@ order by
 <h3><a name="Ordering"></a>Ordering</h3>
 <p>Lucene property index provides efficient sorting support based on Lucene DocValue fields. To configure specify the list of property names which can be used in the <tt>order by</tt> clause as part of <tt>orderedProps</tt> property.</p>
 <p>If the property is of type other than string then you must specify the property definition with <tt>type</tt> details</p>
-<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2196">Lucene based Sorting</a> for more details. </p>
+<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2196">Lucene based Sorting</a> for more details.</p>
 <p><a name="osgi-config"></a></p></div>
 <div class="section">
 <h3><a name="LuceneIndexProvider_Configuration"></a>LuceneIndexProvider Configuration</h3>
-<p>Some of the runtime aspects of the Oak Lucene support can be configured via OSGi configuration. The configuration needs to be done for PID <tt>org.apache
-.jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService</tt></p>
+<p>Some of the runtime aspects of the Oak Lucene support can be configured via OSGi configuration. The configuration needs to be done for PID <tt>org.apache .jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService</tt></p>
 <p><img src="lucene-osgi-config.png" alt="OSGi Configuration" /></p>
-
 <dl>
+
 <dt>enableCopyOnReadSupport</dt>
 <dd>Enable copying of Lucene index to local file system to improve query performance. See <a href="#copy-on-read">Copy Indexes On Read</a></dd>
 <dt>localIndexDir</dt>
@@ -457,40 +459,49 @@ order by
 <h3><a name="Non_Root_Index_Definitions"></a>Non Root Index Definitions</h3>
 <p>Lucene index definition can be defined at any location in repository and need not always be defined at root. For example if your query involves path restrictions like</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">select * from [app:Asset] as a where ISDESCENDANTNODE(a, '/content/companya') and [format] = 'image'
+<div>
+<div>
+<pre class="source">select * from [app:Asset] as a where ISDESCENDANTNODE(a, '/content/companya') and [format] = 'image'
 </pre></div></div>
+
 <p>Then you can create the required index definition say <tt>assetIndex</tt> at <tt>/content/companya/oak:index/assetIndex</tt>. In such a case that index would contain data for the subtree under <tt>/content/companya</tt></p>
 <p><a name="native-query"></a></p></div>
 <div class="section">
 <h3><a name="Native_Query_and_Index_Selection"></a>Native Query and Index Selection</h3>
 <p>Oak query engine supports native queries like</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">//*[rep:native('lucene', 'name:(Hello OR World)')]
+<div>
+<div>
+<pre class="source">//*[rep:native('lucene', 'name:(Hello OR World)')]
 </pre></div></div>
-<p>If multiple Lucene based indexes are enabled on the system and you need to make use of specific Lucene index like <tt>/oak:index/assetIndex</tt> then you can specify the index name via <tt>functionName</tt> attribute on index definition. </p>
-<p>For example for assetIndex definition like </p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">{
+<p>If multiple Lucene based indexes are enabled on the system and you need to make use of specific Lucene index like <tt>/oak:index/assetIndex</tt> then you can specify the index name via <tt>functionName</tt> attribute on index definition.</p>
+<p>For example for assetIndex definition like</p>
+
+<div>
+<div>
+<pre class="source">{
   &quot;jcr:primaryType&quot;:&quot;oak:QueryIndexDefinition&quot;,
   &quot;type&quot;:&quot;lucene&quot;,
   ...
   &quot;functionName&quot; : &quot;lucene-assetIndex&quot;,
 }
 </pre></div></div>
+
 <p>Executing following query would ensure that Lucene index from <tt>assetIndex</tt> should be used</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">//*[rep:native('lucene-assetIndex', 'name:(Hello OR World)')]
-</pre></div></div></div>
+<div>
+<div>
+<pre class="source">//*[rep:native('lucene-assetIndex', 'name:(Hello OR World)')]
+</pre></div></div>
+</div>
 <div class="section">
 <h3><a name="Persisting_indexes_to_FileSystem"></a>Persisting indexes to FileSystem</h3>
 <p>By default Lucene indexes are stored in the <tt>NodeStore</tt>. If required they can be stored on the file system directly</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">{
+<div>
+<div>
+<pre class="source">{
   &quot;jcr:primaryType&quot;:&quot;oak:QueryIndexDefinition&quot;,
   &quot;type&quot;:&quot;lucene&quot;,
   ...
@@ -498,6 +509,7 @@ order by
   &quot;path&quot; : &quot;/path/to/store/index&quot;
 }
 </pre></div></div>
+
 <p>To store the Lucene index in the file system, in the Lucene index definition node, set the property <tt>persistence</tt> to <tt>file</tt>, and set the property <tt>path</tt> to the directory where the index should be stored. Then start reindexing by setting <tt>reindex</tt> to <tt>true</tt>.</p>
 <p>Note that this setup would only for those non cluster <tt>NodeStore</tt>. If the backend <tt>NodeStore</tt> supports clustering then index data would not be accessible on other cluster nodes</p>
 <p><a name="copy-on-read"></a></p></div>
@@ -514,22 +526,25 @@ order by
 <p><a name="luke"></a></p></div>
 <div class="section">
 <h3><a name="Analyzing_created_Lucene_Index"></a>Analyzing created Lucene Index</h3>
-<p><a class="externalLink" href="https://code.google.com/p/luke/">Luke</a> is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display index details. In Oak Lucene index files are stored in <tt>NodeStore</tt> and hence not directly accessible. To enable analyzing the index files via Luke follow below mentioned steps</p>
-
+<p><a class="externalLink" href="https://code.google.com/p/luke/">Luke</a>  is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display index details. In Oak Lucene index files are stored in <tt>NodeStore</tt> and hence not directly accessible. To enable analyzing the index files via Luke follow below mentioned steps</p>
 <ol style="list-style-type: decimal">
-  
+
 <li>
-<p>Download the Luke version which includes the matching Lucene jars used by  Oak. As of Oak 1.0.8 release the Lucene version used is 4.7.1. So download the jar from <a class="externalLink" href="https://github.com/DmitryKey/luke/releases">here</a></p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">$wget https://github.com/DmitryKey/luke/releases/download/4.7.0/luke-with-deps.jar
-</pre></div></div></li>
-  
+
+<p>Download the Luke version which includes the matching Lucene jars used by Oak. As of Oak 1.0.8 release the Lucene version used is 4.7.1. So download the jar from <a class="externalLink" href="https://github.com/DmitryKey/luke/releases">here</a></p>
+
+<div>
+<div>
+<pre class="source">$wget https://github.com/DmitryKey/luke/releases/download/4.7.0/luke-with-deps.jar
+</pre></div></div>
+</li>
 <li>
-<p>Use the <a class="externalLink" href="https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#console">Oak Console</a> to dump the Lucene index from <tt>NodeStore</tt>  to filesystem directory. Use the <tt>lc dump</tt> command</p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">$ java -jar oak-run-*.jar console /path/to/oak/repository
+
+<p>Use the <a class="externalLink" href="https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#console">Oak Console</a> to dump the Lucene index from <tt>NodeStore</tt> to filesystem directory. Use the <tt>lc dump</tt> command</p>
+
+<div>
+<div>
+<pre class="source">$ java -jar oak-run-*.jar console /path/to/oak/repository
 Apache Jackrabbit Oak 1.1-SNAPSHOT
 Jackrabbit Oak Shell (Apache Jackrabbit Oak 1.1-SNAPSHOT, JVM: 1.7.0_55)
 Type ':help' or ':h' for help.
@@ -547,27 +562,32 @@ Copied 74.1 MB in 1.209 s
 Copying Lucene indexes to [/path/to/dump/index/lucene-index/slingAlias]
 Copied 8.5 MB in 218.7 ms
 /&gt;
-</pre></div></div></li>
-  
+</pre></div></div>
+</li>
 <li>
-<p>Post dump open the index via Luke. Oak Lucene uses a <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-1737">custom  Codec</a>. So oak-lucene jar needs to be included in Luke classpath  for it to display the index details</p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">$ java -XX:MaxPermSize=512m luke-with-deps.jar:oak-lucene-1.0.8.jar org.getoptuke.Luke
-</pre></div></div></li>
+
+<p>Post dump open the index via Luke. Oak Lucene uses a <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-1737">custom Codec</a>. So oak-lucene jar needs to be included in Luke classpath for it to display the index details</p>
+
+<div>
+<div>
+<pre class="source">$ java -XX:MaxPermSize=512m luke-with-deps.jar:oak-lucene-1.0.8.jar org.getoptuke.Luke
+</pre></div></div>
+</li>
 </ol>
 <p>From the Luke UI shown you can access various details.</p></div>
 <div class="section">
 <h3><a name="Index_performance"></a>Index performance</h3>
 <p>Following are some best practices to get good performance from Lucene based indexes</p>
-
 <ol style="list-style-type: decimal">
-  
+
 <li>
-<p>Make use on <a href="#non-root-index">non root indexes</a>. If you query always  perform search under certain paths then create index definition under those  paths only. This might be helpful in multi tenant deployment where each tenant  data is stored under specific repository path and all queries are made under  those path.</p></li>
-  
+
+<p>Make use on <a href="#non-root-index">non root indexes</a>. If you query always perform search under certain paths then create index definition under those paths only. This might be helpful in multi tenant deployment where each tenant data is stored under specific repository path and all queries are made under those path.</p>
+</li>
 <li>
-<p>Index only required data. Depending on your requirement you can create  multiple Lucene indexes. For example if in majority of cases you are  querying on various properties specified under <tt>&lt;node&gt;/jcr:content/metadata</tt>  where node belong to certain specific nodeType then create single index  definition listing all such properties and restrict it that nodeType. You  can the size of index via mbean</p></li>
+
+<p>Index only required data. Depending on your requirement you can create multiple Lucene indexes. For example if in majority of cases you are querying on various properties specified under <tt>&lt;node&gt;/jcr:content/metadata</tt> where node belong to certain specific nodeType then create single index definition listing all such properties and restrict it that nodeType. You can the size of index via mbean</p>
+</li>
 </ol></div></div>
         </div>
       </div>



Mime
View raw message