jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mreut...@apache.org
Subject svn commit: r1835390 [12/23] - in /jackrabbit/site/live/oak/docs: ./ architecture/ coldstandby/ features/ nodestore/ nodestore/document/ nodestore/segment/ oak-mongo-js/ oak_api/ plugins/ query/ security/ security/accesscontrol/ security/authentication...
Date Mon, 09 Jul 2018 08:53:19 GMT
Modified: jackrabbit/site/live/oak/docs/query/lucene.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/lucene.html?rev=1835390&r1=1835389&r2=1835390&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/lucene.html (original)
+++ jackrabbit/site/live/oak/docs/query/lucene.html Mon Jul  9 08:53:17 2018
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia Site Renderer 1.7.4 at 2018-05-24 
+ | Generated by Apache Maven Doxia Site Renderer 1.8.1 at 2018-07-09 
  | Rendered using Apache Maven Fluido Skin 1.6
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20180524" />
+    <meta name="Date-Revision-yyyymmdd" content="20180709" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak &#x2013; Lucene Index</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
@@ -136,7 +136,7 @@
 
       <div id="breadcrumbs">
         <ul class="breadcrumb">
-        <li id="publishDate">Last Published: 2018-05-24<span class="divider">|</span>
+        <li id="publishDate">Last Published: 2018-07-09<span class="divider">|</span>
 </li>
           <li id="projectVersion">Version: 1.10-SNAPSHOT</li>
         </ul>
@@ -241,131 +241,97 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><div class="section">
+  -->
+<div class="section">
 <h2><a name="Lucene_Index"></a>Lucene Index</h2>
-
 <ul>
-  
+
 <li><a href="#new-1.6">New in 1.6</a></li>
-  
 <li><a href="#index-definition">Index Definition</a>
-  
 <ul>
-    
+
 <li><a href="#indexing-rules">Indexing Rules</a>
-    
 <ul>
-      
+
 <li><a href="#cost-overrides">Cost Overrides</a></li>
-      
 <li><a href="#indexing-rule-inheritence">Indexing Rule inheritance</a></li>
-      
 <li><a href="#property-definitions">Property Definitions</a></li>
-      
 <li><a href="#path-restrictions">Evaluate Path Restrictions</a></li>
-      
 <li><a href="#include-exclude">Include and Exclude paths from indexing</a></li>
-    </ul></li>
-    
+</ul>
+</li>
 <li><a href="#aggregation">Aggregation</a></li>
-    
 <li><a href="#analyzers">Analyzers</a>
-    
 <ul>
-      
+
 <li><a href="#analyzer-classes">Specify analyzer class directly</a></li>
-      
 <li><a href="#analyzer-composition">Create analyzer via composition</a></li>
-    </ul></li>
-    
+</ul>
+</li>
 <li><a href="#codec">Codec</a></li>
-    
 <li><a href="#boost">Boost and Search Relevancy</a></li>
-    
 <li><a href="#stored-index-definition">Effective Index Definition</a></li>
-    
 <li><a href="#generate-index-definition">Generating Index Definition</a></li>
-  </ul></li>
-  
+</ul>
+</li>
 <li><a href="#nrt-indexing">Near Real Time Indexing</a></li>
-  
 <li><a href="#osgi-config">LuceneIndexProvider Configuration</a></li>
-  
 <li><a href="#tika-config">Tika Config</a>
-  
 <ul>
-    
+
 <li><a href="#mime-type-usage">Mime type usage</a></li>
-    
 <li><a href="#mime-type-mapping">Mime type mapping</a></li>
-  </ul></li>
-  
+</ul>
+</li>
 <li><a href="#non-root-index">Non Root Index Definitions</a></li>
-  
 <li><a href="#native-query">Native Query and Index Selection</a></li>
-  
 <li><a href="#copy-on-read">CopyOnRead</a></li>
-  
 <li><a href="#copy-on-write">CopyOnWrite</a></li>
-  
 <li><a href="#mbeans">Lucene Index MBeans</a></li>
-  
 <li><a href="#active-blob-collection">Active Index Files Collection</a></li>
-  
 <li><a href="#luke">Analyzing created Lucene Index</a></li>
-  
 <li><a href="#text-extraction">Pre-Extracting Text from Binaries</a></li>
-  
 <li><a href="#advanced-search-features">Advanced search features</a>
-  
 <ul>
-    
+
 <li><a href="#suggestions">Suggestions</a></li>
-    
 <li><a href="#spellchecking">Spellchecking</a></li>
-    
 <li><a href="#facets">Facets</a></li>
-    
 <li><a href="#score-explanation">Score Explanation</a></li>
-    
 <li><a href="#custom-hooks">Custom hooks</a></li>
-  </ul></li>
-  
+</ul>
+</li>
 <li><a href="#design-considerations">Design Considerations</a></li>
-  
 <li><a href="#limits">Limits</a></li>
-  
 <li><a href="#lucene-vs-property">Lucene Index vs Property Index</a></li>
-  
 <li><a href="#examples">Examples</a>
-  
 <ul>
-    
+
 <li><a href="#simple-queries">A - Simple queries</a></li>
-    
 <li><a href="#queries-structured-content">B - Queries for structured content</a>
-    
 <ul>
-      
+
 <li><a href="#uc1">UC1 - Find all assets which are having <tt>status</tt> as <tt>published</tt></a></li>
-      
 <li><a href="#uc2">UC2 - Find all assets which are having <tt>status</tt> as <tt>published</tt> sorted by last modified date</a></li>
-      
 <li><a href="#uc3">UC3 - Find all assets where comment contains <i>december</i></a></li>
-      
 <li><a href="#uc4">UC4 - Find all assets which are created by David and refer to december</a></li>
-    </ul></li>
-  </ul></li>
+</ul>
+</li>
+</ul>
+</li>
 </ul>
 <p>Oak supports Lucene based indexes to support both property constraint and full text constraints. Depending on the configuration a Lucene index can be used to evaluate property constraints, full text constraints, path restrictions and sorting.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">SELECT * FROM [nt:base] WHERE [assetType] = 'image'
+<div>
+<div>
+<pre class="source">SELECT * FROM [nt:base] WHERE [assetType] = 'image'
 </pre></div></div>
+
 <p>Following index definition would allow using Lucene index for above query</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/assetType
+<div>
+<div>
+<pre class="source">/oak:index/assetType
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
@@ -379,26 +345,27 @@
           - propertyIndex = true
           - name = &quot;assetType&quot;
 </pre></div></div>
-<p>The index definition node for a lucene-based index</p>
 
+<p>The index definition node for a lucene-based index</p>
 <ul>
-  
+
 <li>must be of type <tt>oak:QueryIndexDefinition</tt></li>
-  
 <li>must have the <tt>type</tt> property set to <b><tt>lucene</tt></b></li>
-  
-<li>must contain the <tt>async</tt> property set to the value <tt>async</tt>, this is what  sends the index update process to a background thread</li>
+<li>must contain the <tt>async</tt> property set to the value <tt>async</tt>, this is what sends the index update process to a background thread</li>
 </ul>
 <p><i>Note that compared to <a href="query.html#property-index">Property Index</a> Lucene Property Index is always configured in Async mode hence it might lag behind in reflecting the current repository state while performing the query</i></p>
 <p>Taking another example. To support following query</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">//*[jcr:contains(., 'text')]
+<div>
+<div>
+<pre class="source">//*[jcr:contains(., 'text')]
 </pre></div></div>
+
 <p>The Lucene index needs to be configured to index all properties</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/assetType
+<div>
+<div>
+<pre class="source">/oak:index/assetType
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
@@ -413,14 +380,13 @@
           - isRegexp = true
           - nodeScopeIndex = true
 </pre></div></div>
+
 <div class="section">
 <h3><a name="New_in_1.6"></a><a name="new-1.6"></a> New in 1.6</h3>
 <p>Following are the new features in 1.6 release</p>
-
 <ul>
-  
+
 <li><a href="#nrt-indexing">Near Real Time Indexing</a></li>
-  
 <li><a href="#stored-index-definition">Effective Index Definition</a></li>
 </ul></div>
 <div class="section">
@@ -428,8 +394,9 @@
 <p>Lucene index definition consist of <tt>indexingRules</tt>, <tt>analyzers</tt> , <tt>aggregates</tt> etc which determine which node and properties are to be indexed and how they are indexed.</p>
 <p>Below is the canonical index definition structure</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">luceneIndex (oak:QueryIndexDefinition)
+<div>
+<div>
+<pre class="source">luceneIndex (oak:QueryIndexDefinition)
   - type (string) = 'lucene' mandatory
   - async (string) = 'async' mandatory
   - blobSize (long) = 32768
@@ -448,9 +415,10 @@
   + analyzers (nt:unstructured)
   + tika (nt:unstructured)
 </pre></div></div>
-<p>Following are the config options which can be defined at the index definition level</p>
 
+<p>Following are the config options which can be defined at the index definition level</p>
 <dl>
+
 <dt>type</dt>
 <dd>Required and should always be <tt>lucene</tt></dd>
 <dt>async</dt>
@@ -474,7 +442,7 @@
 <dd>List of paths for which the index can be used to perform queries. Refer to <a href="#include-exclude">Path Includes/Excludes</a> for more details</dd>
 <dt>indexPath</dt>
 <dd>Optional string property to specify <a href="#copy-on-write">index path</a></dd>
-<dd>Path of the index definition in the repository. For e.g. if the index  definition is specified at <tt>/oak:index/lucene</tt> then set this path in <tt>indexPath</tt></dd>
+<dd>Path of the index definition in the repository. For e.g. if the index definition is specified at <tt>/oak:index/lucene</tt> then set this path in <tt>indexPath</tt></dd>
 <dt>codec</dt>
 <dd>Optional string property</dd>
 <dd>Name of the <a href="#codec">Lucene codec</a> to use</dd>
@@ -483,7 +451,7 @@
 <dd>Captures the name of the index which is used while logging</dd>
 <dt>compatVersion</dt>
 <dd>Required integer property and should be set to 2</dd>
-<dd>By default Oak uses older Lucene index implementation which does not  supports property restrictions, index time aggregation etc.  To make use of this feature set it to 2.  Please note for full text indexing with compatVersion 2,  at query time, only the access right of the parent (aggregate) node is checked,  and the access right of the child nodes is not checked.  If this is a security concern, then compatVersion should not be set,  so that query time aggregation is used, in which case the access right  of the relevant child is also checked.  A compatVersion 2 full text index is usually faster to run queries.</dd>
+<dd>By default Oak uses older Lucene index implementation which does not supports property restrictions, index time aggregation etc. To make use of this feature set it to 2. Please note for full text indexing with compatVersion 2, at query time, only the access right of the parent (aggregate) node is checked, and the access right of the child nodes is not checked. If this is a security concern, then compatVersion should not be set, so that query time aggregation is used, in which case the access right of the relevant child is also checked. A compatVersion 2 full text index is usually faster to run queries.</dd>
 <dt><a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2469">maxFieldLength</a></dt>
 <dd>Numbers of terms indexed per field. Defaults to 10000</dd>
 <dt>refresh</dt>
@@ -494,8 +462,9 @@
 <h4><a name="Indexing_Rules"></a><a name="indexing-rules"></a> Indexing Rules</h4>
 <p>Indexing rules defines which types of node and properties are indexed. An index configuration can define one or more <tt>indexingRules</tt> for different nodeTypes.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">fulltextIndex
+<div>
+<div>
+<pre class="source">fulltextIndex
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
@@ -515,52 +484,55 @@
           - propertyIndex = true
           - name = &quot;jcr:content/metadata/imageType&quot;
 </pre></div></div>
+
 <p>Rules are defined per nodeType and each rule has one or more property definitions determine which properties are indexed. Below is the canonical index definition structure</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">ruleName (nt:unstructured)
+<div>
+<div>
+<pre class="source">ruleName (nt:unstructured)
   - inherited (boolean) = true
   - indexNodeName (boolean) = false
   - includePropertyTypes (string) multiple
   + properties (nt:unstructured)
 </pre></div></div>
-<p>Following are the config options which can be defined at the index rule level</p>
 
+<p>Following are the config options which can be defined at the index rule level</p>
 <dl>
+
 <dt>inherited</dt>
 <dd>Optional boolean property defaults to true</dd>
-<dd>Determines if the rule is applicable on exact match or can be applied if  match is done on basis of nodeType inheritance</dd>
+<dd>Determines if the rule is applicable on exact match or can be applied if match is done on basis of nodeType inheritance</dd>
 <dt>includePropertyTypes</dt>
 <dd>Applicable when index is enabled for fulltext indexing</dd>
 <dd>For full text index defaults to include all types</dd>
-<dd>String array of property types which should be indexed. The values can be one  specified in <a class="externalLink" href="http://www.day.com/specs/jsr170/javadocs/jcr-2.0/constant-values.html#javax.jcr.PropertyType.TYPENAME_STRING">PropertyType Names</a></dd>
-<dt><a name="index-node-name"></a><br />indexNodeName</dt>
+<dd>String array of property types which should be indexed. The values can be one specified in <a class="externalLink" href="http://www.day.com/specs/jsr170/javadocs/jcr-2.0/constant-values.html#javax.jcr.PropertyType.TYPENAME_STRING">PropertyType Names</a></dd>
+<dt><a name="index-node-name"></a></dt>
+<dt>indexNodeName</dt>
 <dd><tt>@since Oak 1.0.20, 1.2.5</tt></dd>
-<dd>Default to false. If set to true then index would also be created for node name.  This would enable faster evaluation of queries involving constraints on Node  name. For example
-  
+<dd>Default to false. If set to true then index would also be created for node name. This would enable faster evaluation of queries involving constraints on Node name. For example
 <ul>
-    
+
 <li><i>select [jcr:path] from [nt:base] where NAME() = &#x2018;kite&#x2019;</i></li>
-    
 <li><i>select [jcr:path] from [nt:base] where NAME() LIKE &#x2018;kite%&#x2019;</i></li>
-    
 <li>//kite</li>
-    
 <li>//*[jcr:like(fn:name(), &#x2018;kite%&#x2019;)]</li>
-    
 <li>//element(*, app:Asset)[fn:name() = &#x2018;kite&#x2019;]</li>
-    
 <li>//element(kite, app:Asset)</li>
-  </ul></dd>
+</ul>
+</dd>
 </dl>
 <div class="section">
 <h5><a name="Cost_Overrides"></a><a name="cost-overrides"></a> Cost Overrides</h5>
 <p>By default, the cost of using this index is calculated follows: For each query, the overhead is one operation. For each entry in the index, the cost is one. The following only applies to <tt>compatVersion</tt> 2 only: To use use a lower or higher cost, you can set the following optional properties in the index definition:</p>
+<ul>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">- costPerExecution (Double) = 1.0
-- costPerEntry (Double) = 1.0
-</pre></div></div>
+<li>costPerExecution (Double) = 1.0
+<ul>
+
+<li>costPerEntry (Double) = 1.0</li>
+</ul>
+</li>
+</ul>
 <p>Please note that typically, those settings don&#x2019;t need to be explicitly set. Cost per execution is the overhead of one query. Cost per entry is the cost per node in the index. Using 0.5 means the cost is half, which means the index would be used used more often (that is, even if there is a different index with similar cost).</p></div>
 <div class="section">
 <h5><a name="Indexing_Rule_inheritance"></a><a name="indexing-rule-inheritence"></a>Indexing Rule inheritance</h5>
@@ -570,8 +542,9 @@
 <h5><a name="Property_Definitions"></a><a name="property-definitions"></a>Property Definitions</h5>
 <p>Each index rule consist of one ore more property definition defined under <tt>properties</tt>. Order of property definition node is important as some properties are based on regular expressions. Below is the canonical property definition structure</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">propNode (nt:unstructured)
+<div>
+<div>
+<pre class="source">propNode (nt:unstructured)
   - name (string)
   - boost (double) = '1.0'
   - index (boolean) = true
@@ -587,156 +560,158 @@
   - excludeFromAggregation (boolean) = false
   - weight (long) = -1
 </pre></div></div>
-<p>Following are the details about the above mentioned config options which can be defined at the property definition level</p>
 
+<p>Following are the details about the above mentioned config options which can be defined at the property definition level</p>
 <dl>
+
 <dt>name</dt>
-<dd>Property name. If not defined then property name is set to the node name.  If <tt>isRegexp</tt> is true then it defines the regular expression. Can also be set  to a relative property.</dd>
+<dd>Property name. If not defined then property name is set to the node name. If <tt>isRegexp</tt> is true then it defines the regular expression. Can also be set to a relative property.</dd>
 <dt>isRegexp</dt>
-<dd>If set to true then property name would be interpreted as a regular  expression and the given definition would be applicable for matching property  names. Note that expression should be structured such that it does not  match &#x2018;/&#x2019;.
-  
+<dd>If set to true then property name would be interpreted as a regular expression and the given definition would be applicable for matching property names. Note that expression should be structured such that it does not match &#x2018;/&#x2019;.
 <ul>
-    
-<li><tt>.*</tt> - This property definition is applicable for all properties of given  node</li>
-    
-<li><tt>jcr:content/metadata/.*</tt> - This property definition is  applicable for all properties of child node <i>jcr:content/metadata</i></li>
-  </ul></dd>
+
+<li><tt>.*</tt> - This property definition is applicable for all properties of given node</li>
+<li><tt>jcr:content/metadata/.*</tt> - This property definition is applicable for all properties of child node <i>jcr:content/metadata</i></li>
+</ul>
+</dd>
 <dt>boost</dt>
-<dd>If the property is included in <tt>nodeScopeIndex</tt> then it defines the boost  done for the index value against the given property name. See  <a href="#boost">Boost and Search Relevancy</a> for more details</dd>
+<dd>If the property is included in <tt>nodeScopeIndex</tt> then it defines the boost done for the index value against the given property name. See <a href="#boost">Boost and Search Relevancy</a> for more details</dd>
 <dt>index</dt>
-<dd>Determines if this property should be indexed. Mostly useful for fulltext  index where some properties need to be <i>excluded</i> from getting indexed.</dd>
+<dd>Determines if this property should be indexed. Mostly useful for fulltext index where some properties need to be <i>excluded</i> from getting indexed.</dd>
 <dt>useInExcerpt</dt>
-<dd>Controls whether the value of a property should be used to create an excerpt.  The value of the property is still full-text indexed when set to false, but it  will never show up in an excerpt for its parent node. If set to true then  property value would be stored separately within index causing the index  size to increase. So set it to true only if you make use of excerpt feature</dd>
+<dd>Controls whether the value of a property should be used to create an excerpt. The value of the property is still full-text indexed when set to false, but it will never show up in an excerpt for its parent node. If set to true then property value would be stored separately within index causing the index size to increase. So set it to true only if you make use of excerpt feature</dd>
 <dt>nodeScopeIndex</dt>
-<dd>Control whether the value of a property should be part of fulltext index. That  is, you can do a <i>jcr:contains(., &#x2018;foo&#x2019;)</i> and it will return nodes that have a  string property that contains the word foo. Example
-  
+<dd>Control whether the value of a property should be part of fulltext index. That is, you can do a <i>jcr:contains(., &#x2018;foo&#x2019;)</i> and it will return nodes that have a string property that contains the word foo. Example
 <ul>
-    
+
 <li><i>//element(*, app:Asset)[jcr:contains(., &#x2018;image&#x2019;)]</i></li>
-  </ul></dd>
+</ul>
+</dd>
 </dl>
-<p>In case of aggregation all properties would be indexed at node level by default  if the property type is part of <tt>includePropertyTypes</tt>. However if there is an  explicit property definition provided then it would only be included if  <tt>nodeScopeIndex</tt> is set to true.</p>
-
+<p>In case of aggregation all properties would be indexed at node level by default if the property type is part of <tt>includePropertyTypes</tt>. However if there is an explicit property definition provided then it would only be included if <tt>nodeScopeIndex</tt> is set to true.</p>
 <dl>
+
 <dt>analyzed</dt>
 <dd>Set this to true if the property is used as part of <tt>contains</tt>. Example
-  
 <ul>
-    
+
 <li><i>//element(*, app:Asset)[jcr:contains(type, &#x2018;image&#x2019;)]</i></li>
-    
 <li><i>//element(*, app:Asset)[jcr:contains(jcr:content/metadata/@format, &#x2018;image&#x2019;)]</i></li>
-  </ul></dd>
+</ul>
+</dd>
 <dt><a name="ordered"></a></dt>
 <dt>ordered</dt>
-<dd>If the property is to be used in <i>order by</i> clause to perform sorting then  this should be set to true. This should be set to true only if the property  is to be used to perform sorting as it increases the index size. Example
-  
+<dd>If the property is to be used in <i>order by</i> clause to perform sorting then this should be set to true. This should be set to true only if the property is to be used to perform sorting as it increases the index size. Example
 <ul>
-    
+
 <li><i>//element(*, app:Asset)[jcr:contains(type, &#x2018;image&#x2019;)] order by @size</i></li>
-    
 <li><i>//element(*, app:Asset)[jcr:contains(type, &#x2018;image&#x2019;)] order by jcr:content/@jcr:lastModified</i></li>
-  </ul></dd>
+</ul>
+</dd>
 </dl>
-<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2196">Lucene based Sorting</a> for more details. Note that this is  only supported for single value property. Enabling this on multi value property  would cause indexing to fail.</p>
-
+<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2196">Lucene based Sorting</a> for more details. Note that this is only supported for single value property. Enabling this on multi value property would cause indexing to fail.</p>
 <dl>
+
 <dt>type</dt>
-<dd>JCR Property type. Can be one of <tt>Date</tt>, <tt>Boolean</tt>, <tt>Double</tt> , <tt>String</tt> or <tt>Long</tt>. Mostly  inferred from the indexed value. However in some cases where same property  type is not used consistently across various nodes then it would recommended  to specify the type explicitly.</dd>
+<dd>JCR Property type. Can be one of <tt>Date</tt>, <tt>Boolean</tt>, <tt>Double</tt> , <tt>String</tt> or <tt>Long</tt>. Mostly inferred from the indexed value. However in some cases where same property type is not used consistently across various nodes then it would recommended to specify the type explicitly.</dd>
 <dt>propertyIndex</dt>
-<dd>Whether the index for this property is used for equality conditions, ordering,  and is not null conditions.</dd>
+<dd>Whether the index for this property is used for equality conditions, ordering, and is not null conditions.</dd>
 <dt>notNullCheckEnabled</dt>
 <dd>Since 1.1.8</dd>
-<dd>If the property is checked for <i>is not null</i> then this should be set to true.  To reduce the index size,  this should only be enabled for nodeTypes that are not generic.
-  
+<dd>If the property is checked for <i>is not null</i> then this should be set to true. To reduce the index size, this should only be enabled for nodeTypes that are not generic.
 <ul>
-    
+
 <li>_//element(*, app:Asset)[jcr:content/@excludeFromSearch]</li>
-  </ul></dd>
+</ul>
+</dd>
 </dl>
 <p>For details, see <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2234">IS NOT NULL support</a>.</p>
-
 <dl>
+
 <dt>nullCheckEnabled</dt>
 <dd>Since 1.0.12</dd>
-<dd>If the property is checked for <i>is null</i> then this should be set to true. This  should only be enabled for nodeTypes that are not generic as it leads to index  entry for all nodes of that type where this property is not set.
-  
+<dd>If the property is checked for <i>is null</i> then this should be set to true. This should only be enabled for nodeTypes that are not generic as it leads to index entry for all nodes of that type where this property is not set.
 <ul>
-    
+
 <li>_//element(*, app:Asset)[not(jcr:content/@excludeFromSearch)]</li>
-  </ul></dd>
+</ul>
+</dd>
 </dl>
-<p>It would be better to use a query which checks for property existence or property  being set to specific values as such queries can make use of index without any  extra storage cost.</p>
+<p>It would be better to use a query which checks for property existence or property being set to specific values as such queries can make use of index without any extra storage cost.</p>
 <p>For details, see <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2517">IS NULL support</a>.</p>
-
 <dl>
+
 <dt>excludeFromAggregation</dt>
 <dd>Since 1.0.27, 1.2.11</dd>
 <dd>if set to true the property would be excluded from aggregation <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3981">OAK-3981</a></dd>
 <dt><a name="weight"></a></dt>
 <dt>weight</dt>
-<dd>Allows to override the estimated number of entries per value,  which affects the cost of the index.</dd>
-<dd>Since 1.6.3: if <tt>weight</tt> is set to <tt>0</tt>, then this property is assumed not to reduce the cost.  Queries that contain <i>only</i> this condition should not use that index.  See <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-5899">OAK-5899</a> for details.</dd>
-<dd>Since 1.7.11: if <tt>weight</tt> is set to <tt>10</tt>, then the estimated number of unique entries is 10.  This means, the cost is reduced by a factor of about 10, for queries that contain this condition.  See <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-6735">OAK-6735</a> for details.</dd>
+<dd>Allows to override the estimated number of entries per value, which affects the cost of the index.</dd>
+<dd>Since 1.6.3: if <tt>weight</tt> is set to <tt>0</tt>, then this property is assumed not to reduce the cost. Queries that contain <i>only</i> this condition should not use that index. See <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-5899">OAK-5899</a> for details.</dd>
+<dd>Since 1.7.11: if <tt>weight</tt> is set to <tt>10</tt>, then the estimated number of unique entries is 10. This means, the cost is reduced by a factor of about 10, for queries that contain this condition. See <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-6735">OAK-6735</a> for details.</dd>
 </dl>
 <p><a name="property-names"></a><b>Property Names</b></p>
 <p>Property name can be one of following</p>
-
 <ol style="list-style-type: decimal">
-  
-<li>Simple name - Like <i>assetType</i> etc. These are used for properties which are  defined directly on the indexed node</li>
-  
-<li>Relative name - Like <i>jcr:content/metadata/title</i>. These are used for  properties which are defined relative to the node being indexed.</li>
-  
-<li>Regular Expression - Like <i>.*</i>. Used when only property whose name  match given pattern are to be indexed.  They can also be used for relative properties like  <i>jcr:content/metadata/dc:.*$</i>  which indexes all property names starting with <i>dc</i> from node with  relative path <i>jcr:content/metadata</i></li>
-  
-<li>The string <tt>:nodeName</tt> - this special case indexes node name as if it&#x2019;s a  virtual property of the node being indexed. Setting this along with  <tt>nodeScopeIndex=true</tt> is akin to setting <tt>indexNodeName=true</tt> on indexing  rule. (<tt>@since Oak 1.3.15, 1.2.14</tt>)</li>
+
+<li>Simple name - Like <i>assetType</i> etc. These are used for properties which are defined directly on the indexed node</li>
+<li>Relative name - Like <i>jcr:content/metadata/title</i>. These are used for properties which are defined relative to the node being indexed.</li>
+<li>Regular Expression - Like <i>.*</i>. Used when only property whose name match given pattern are to be indexed. They can also be used for relative properties like <i>jcr:content/metadata/dc:.*$</i> which indexes all property names starting with <i>dc</i> from node with relative path <i>jcr:content/metadata</i></li>
+<li>The string <tt>:nodeName</tt> - this special case indexes node name as if it&#x2019;s a virtual property of the node being indexed. Setting this along with <tt>nodeScopeIndex=true</tt> is akin to setting <tt>indexNodeName=true</tt> on indexing rule. (<tt>@since Oak 1.3.15, 1.2.14</tt>)</li>
 </ol></div>
 <div class="section">
 <h5><a name="Evaluate_Path_Restrictions"></a><a name="path-restrictions"></a> Evaluate Path Restrictions</h5>
 <p>Lucene index provides support for evaluating path restrictions natively. Consider a query like</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">select * from [app:Asset] as a where isdescendantnode(a, [/content/app/old]) AND contains(*, 'white')
+<div>
+<div>
+<pre class="source">select * from [app:Asset] as a where isdescendantnode(a, [/content/app/old]) AND contains(*, 'white')
 </pre></div></div>
+
 <p>By default the index would return all node which <i>contain white</i> and Query engine would filter out nodes which are not under <i>/content/app/old</i>. This can perform slow if lots of nodes are not under that path. To speed up such queries one can enable <tt>evaluatePathRestrictions</tt> in Lucene index and index would only return nodes which are under <i>/content/app/old</i>.</p>
 <p>Enabling this feature would incur cost in terms of slight increase in index size. Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2306">OAK-2306</a> for more details.</p></div>
 <div class="section">
 <h5><a name="Include_and_Exclude_paths_from_indexing"></a><a name="include-exclude"></a> Include and Exclude paths from indexing</h5>
 <p><tt>@since Oak 1.0.14, 1.2.3</tt></p>
-<p>By default the indexer would index all the nodes under the subtree where the index definition is defined as per the indexingRule. In some cases its required to index nodes under certain path. For e.g. if index is defined for global fulltext index which include the complete repository you might want to exclude certain path which contains transient system data. </p>
+<p>By default the indexer would index all the nodes under the subtree where the index  definition is defined as per the indexingRule. In some cases its required to index nodes under certain path. For e.g. if index is defined for global fulltext index which include the complete repository you might want to exclude certain path which contains transient system data.</p>
 <p>For example if you application stores certain logs under <tt>/var/log</tt> and it is not supposed to be indexed as part of fulltext index then it can be excluded</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/assetType
+<div>
+<div>
+<pre class="source">/oak:index/assetType
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
   - excludedPaths = [&quot;/var/log&quot;]
 </pre></div></div>
+
 <p>Above index definition would cause nodes under <tt>/var/log</tt> not to be indexed. In majority of case <tt>excludedPaths</tt> only makes sense. However in some cases it might be required to also specify explicit set of path which should be indexed. In that case make use of <tt>includedPaths</tt></p>
 <p>Note that <tt>excludedPaths</tt> and <tt>includedPaths</tt> <i>does not</i> affect the index selection logic for a query i.e. if a query has any path restriction specified then that would not be checked against the <tt>excludedPaths</tt> and <tt>includedPaths</tt>.</p>
 <p><a name="query-paths"></a> <b>queryPaths</b></p>
-<p>If you need to ensure that a given index only gets used for query with specific path restrictions then you need to specify those paths in <tt>queryPaths</tt>. </p>
+<p>If you need to ensure that a given index only gets used for query with specific path restrictions then you need to specify those paths in <tt>queryPaths</tt>.</p>
 <p>For example if <tt>includedPaths</tt> and <tt>queryPaths</tt> are set to <i>[ &#x201c;/content/a&#x201d;, &#x201c;/content/b&#x201d; ]</i>. The index would be used for queries below &#x201c;/content/a&#x201d; as well as for queries below &#x201c;/content/b&#x201d;. But not for queries without path restriction, or for queries below &#x201c;/content/c&#x201d;.</p>
 <p><b>Usage</b></p>
 <p>Key points to consider while using <tt>excludedPaths</tt>, <tt>includedPaths</tt> and <tt>queryPaths</tt></p>
-
 <ol style="list-style-type: decimal">
-  
+
 <li>
-<p>Reduce what gets indexed in global fulltext index - For  setups where a global fulltext index is configured say at /oak:index/lucene which  indexes everything then <tt>excludedPaths</tt> can be used to avoid indexing transient  repository state like in &#x2018;/var&#x2019; or &#x2018;/tmp&#x2019;. This would help in improving indexing  rate. By far this is the primary usecase</p></li>
-  
+
+<p>Reduce what gets indexed in global fulltext index - For setups where a global fulltext index is configured say at /oak:index/lucene which indexes everything then <tt>excludedPaths</tt> can be used to avoid indexing transient repository state like in &#x2018;/var&#x2019; or &#x2018;/tmp&#x2019;. This would help in improving indexing rate. By far this is the primary usecase</p>
+</li>
 <li>
-<p>Reduce reindexing time - If its known that certain type of data is stored under specific  subtree only but the query is not specifying that path restriction then <tt>includedPaths</tt>  can be used to reduce reindexing time for existing content by ensuring that indexing  logic only traverses that path for building up the index</p></li>
-  
+
+<p>Reduce reindexing time - If its known that certain type of data is stored under specific subtree only but the query is not specifying that path restriction then <tt>includedPaths</tt> can be used to reduce reindexing time for existing content by ensuring that indexing logic only traverses that path for building up the index</p>
+</li>
 <li>
-<p>Use <tt>excludedPaths</tt>, <tt>includedPaths</tt> with caution - When paths are excluded or included  then query engine is not aware of that. If wrong paths get excluded then its possible  that nodes which should have been part of query result get excluded as they are not indexed.  So only exclude those paths which do not have node matching given nodeType or nodes which  are known to be not part of any query result</p></li>
-  
+
+<p>Use <tt>excludedPaths</tt>, <tt>includedPaths</tt> with caution - When paths are excluded or included then query engine is not aware of that. If wrong paths get excluded then its possible that nodes which should have been part of query result get excluded as they are not indexed. So only exclude those paths which do not have node matching given nodeType or nodes which are known to be not part of any query result</p>
+</li>
 <li>
-<p>Sub-root index definitions (e.g. <tt>/test/oak:index/index-def-node</tt>) -  <tt>excludedPaths</tt> and <tt>includedPaths</tt> need to be relative to the path that index is defined for. e.g. if the condition is supposed to be put for <tt>/test/a</tt> where the index definition is at <tt>/test/oak:index/index-def-node</tt> then <tt>/a</tt> needs to be put as value of <tt>excludedPaths</tt> or <tt>includedPaths</tt>. On the other hand, <tt>queryPaths</tt> remains to be an absolute path. So, for the example above, <tt>queryPaths</tt> would get the value <tt>/test/a</tt>.</p></li>
+
+<p>Sub-root index definitions (e.g. <tt>/test/oak:index/index-def-node</tt>) - <tt>excludedPaths</tt> and <tt>includedPaths</tt> need to be relative to the path that index is defined for. e.g. if the condition is supposed to be put for <tt>/test/a</tt> where the index definition is at <tt>/test/oak:index/index-def-node</tt> then <tt>/a</tt> needs to be put as value of <tt>excludedPaths</tt> or <tt>includedPaths</tt>. On the other hand, <tt>queryPaths</tt> remains to be an absolute path. So, for the example above, <tt>queryPaths</tt> would get the value <tt>/test/a</tt>.</p>
+</li>
 </ol>
-<p>In most cases use of <tt>queryPaths</tt> would not be required as index definition should not have any overlap. </p>
+<p>In most cases use of <tt>queryPaths</tt> would not be required as index definition should not have any overlap.</p>
 <p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2599">OAK-2599</a> for more details.</p></div></div>
 <div class="section">
 <h4><a name="Aggregation"></a><a name="aggregation"></a>Aggregation</h4>
@@ -744,8 +719,9 @@
 <p>Oak allows you to define index aggregates based on relative path patterns and primary node types. Changes to aggregated items cause the main item to be reindexed, even if it was not modified.</p>
 <p>Aggregation configuration is defined under the <tt>aggregates</tt> node under index configuration. The following example creates an index aggregate on nt:file that includes the content of the jcr:content node:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">fulltextIndex
+<div>
+<div>
+<pre class="source">fulltextIndex
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
@@ -755,61 +731,85 @@
       + include0
         - path = &quot;jcr:content&quot;
 </pre></div></div>
+
 <p>By default all properties whose type matches <tt>includePropertyTypes</tt> and are part of child nodes as per the aggregation pattern are included for indexing. For excluding certain properties define a property definition with relative path and set <tt>excludeFromAggregation</tt> to <tt>true</tt>. Such properties would then be excluded from fulltext index</p>
 <p>For a given nodeType multiple includes can be defined. Below is the aggregate definition structure for any specific include rule</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">aggregateNodeInclude (nt:unstructured)
+<div>
+<div>
+<pre class="source">aggregateNodeInclude (nt:unstructured)
   - path (string) mandatory
   - primaryType (string)
   - relativeNode (boolean) = false
 </pre></div></div>
-<p>Following are the details about the above mentioned config options which can be defined as part of aggregation include. (Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2268">OAK-2268</a> for implementation details)</p>
 
+<p>Following are the details about the above mentioned config options which can be defined as part of aggregation include. (Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2268">OAK-2268</a> for implementation details)</p>
 <dl>
+
 <dt>path</dt>
 <dd>Path pattern to include. Example
-  
 <ul>
-    
+
 <li><tt>jcr:content</tt> - Name explicitly specified</li>
-    
 <li><tt>*</tt> - Any child node at depth 1</li>
-    
 <li><tt>*/*</tt> - Any child node at depth 2</li>
-  </ul></dd>
+</ul>
+</dd>
 <dt>primaryType</dt>
-<dd>
-<p>Restrict the included nodes to a certain type. The restriction would be  applied on the last node in given path</p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">+ aggregates
-  + nt:file
-    + include0
-      - path = &quot;jcr:content&quot;
-      - primaryType = &quot;nt:resource&quot;
-</pre></div></div></dd>
+<dd>Restrict the included nodes to a certain type. The restriction would be applied on the last node in given path
+<ul>
+
+<li>aggregates
+<ul>
+
+<li>nt:file
+<ul>
+
+<li>include0</li>
+<li>path = &#x201c;jcr:content&#x201d;</li>
+<li>primaryType = &#x201c;nt:resource&#x201d;</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</dd>
 <dt>relativeNode</dt>
-<dd>
-<p>Boolean property indicates that query can be performed against specific node  For example for following content</p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">+ space.txt (app:Asset)
-  + renditions (nt:folder)
-    + original (nt:file)
-      + jcr:content (nt:resource)
-        - jcr:data
-</pre></div></div></dd>
+<dd>Boolean property indicates that query can be performed against specific node For example for following content
+<ul>
+
+<li>space.txt (app:Asset)
+<ul>
+
+<li>renditions (nt:folder)
+<ul>
+
+<li>original (nt:file)</li>
+<li>jcr:content (nt:resource)
+<ul>
+
+<li>jcr:data</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</dd>
 </dl>
 <p>And a query like</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">    select * from [app:Asset] where contains([renditions/original/*], &quot;pluto&quot;)
+<div>
+<div>
+<pre class="source">    select * from [app:Asset] where contains([renditions/original/*], &quot;pluto&quot;)
 </pre></div></div>
+
 <p>Following index configuration would be required</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">    fulltextIndex
+<div>
+<div>
+<pre class="source">    fulltextIndex
       - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
       - compatVersion = 2
       - type = &quot;lucene&quot;
@@ -826,160 +826,148 @@
         - jcr:primaryType = &quot;nt:unstructured&quot;
         + app:Asset
 </pre></div></div>
+
 <p><b>Aggregation and Recursion</b></p>
-<p>While performing aggregation the aggregation rules are again applied on node being aggregated. For example while aggregating for <i>app:Asset</i> above when <i>renditions/original/*</i> is being aggregated then aggregation rule would again be applied. In this case as <i>renditions/original</i> is <i>nt:file</i> then aggregation rule applicable for <i>nt:file</i> would be applied. Such a logic might result in recursion. (See <a class="externalLink" href="https://issues.apache.org/jira/browse/JCR-2989?focusedCommentId=13051101">JCR-2989</a> for details).</p>
+<p>While performing aggregation the aggregation rules are again applied on node being aggregated. For example while aggregating for <i>app:Asset</i> above when <i>renditions/original/*</i> is being aggregated then aggregation rule would again be applied. In this case as  <i>renditions/original</i> is <i>nt:file</i> then aggregation rule applicable for <i>nt:file</i> would be applied. Such a logic might result in recursion. (See <a class="externalLink" href="https://issues.apache.org/jira/browse/JCR-2989?focusedCommentId=13051101">JCR-2989</a> for details).</p>
 <p>For such case <tt>reaggregateLimit</tt> is set on aggregate definition node and defaults to 5</p>
+<ul>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">  + aggregates
-    + app:Asset
-      - reaggregateLimit (long) = 5
-      + include0
-        - path = &quot;renditions/original&quot;
-        - relativeNode = true
-</pre></div></div></div>
+<li>aggregates
+<ul>
+
+<li>app:Asset
+<ul>
+
+<li>reaggregateLimit (long) = 5</li>
+<li>include0
+<ul>
+
+<li>path = &#x201c;renditions/original&#x201d;</li>
+<li>relativeNode = true</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul></div>
 <div class="section">
 <h4><a name="Analyzers"></a><a name="analyzers"></a>Analyzers</h4>
 <p><tt>@since Oak 1.5.5, 1.4.7, 1.2.19</tt> Unless custom analyzer is configured (as documented below), in-built analyzer can be configured to include original term as well to be indexed. This is controlled by setting boolean property <tt>indexOriginalTerm</tt> on analyzers node.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/assetType
+<div>
+<div>
+<pre class="source">/oak:index/assetType
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
   + analyzers
     - indexOriginalTerm = true
 </pre></div></div>
+
 <p>(See <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-4516">OAK-4516</a> for details)</p>
 <p><tt>@since Oak 1.2.0</tt></p>
 <p>Analyzers can be configured as part of index definition via <tt>analyzers</tt> node. The default analyzer can be configured via <tt>analyzers/default</tt> node</p>
+<ul>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">+ sampleIndex
-    - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
-    + analyzers
-        + default
-        + pathText
-        ...
-</pre></div></div>
+<li>sampleIndex
+<ul>
+
+<li>jcr:primaryType = &#x201c;oak:QueryIndexDefinition&#x201d;
+<ul>
+
+<li>analyzers
+<ul>
+
+<li>default</li>
+<li>pathText &#x2026;</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
 <div class="section">
 <h5><a name="Specify_analyzer_class_directly"></a><a name="analyzer-classes"></a>Specify analyzer class directly</h5>
 <p>If any of the out of the box analyzer is to be used then it can configured directly</p>
+<ul>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">+ analyzers
-        + default
-            - class = &quot;org.apache.lucene.analysis.standard.StandardAnalyzer&quot;
-            - luceneMatchVersion = &quot;LUCENE_47&quot; (optional)
-</pre></div></div>
+<li>analyzers + default - class = &#x201c;org.apache.lucene.analysis.standard.StandardAnalyzer&#x201d; - luceneMatchVersion = &#x201c;LUCENE_47&#x201d; (optional)</li>
+</ul>
 <p>To confirm to specific version specify it via <tt>luceneMatchVersion</tt> otherwise Oak would use a default version depending on version of Lucene it is shipped with.</p>
 <p>One can also provide a stopword file via <tt>stopwords</tt> <tt>nt:file</tt> node under the analyzer node</p>
+<ul>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">+ analyzers
-        + default
-            - class = &quot;org.apache.lucene.analysis.standard.StandardAnalyzer&quot;
-            - luceneMatchVersion = &quot;LUCENE_47&quot; (optional)
-            + stopwords (nt:file)
-</pre></div></div></div>
+<li>analyzers + default - class = &#x201c;org.apache.lucene.analysis.standard.StandardAnalyzer&#x201d; - luceneMatchVersion = &#x201c;LUCENE_47&#x201d; (optional) + stopwords (nt:file)</li>
+</ul></div>
 <div class="section">
 <h5><a name="Create_analyzer_via_composition"></a><a name="analyzer-composition"></a>Create analyzer via composition</h5>
 <p>Analyzers can also be composed based on <tt>Tokenizers</tt>, <tt>TokenFilters</tt> and <tt>CharFilters</tt>. This is similar to the support provided in Solr where you can <a class="externalLink" href="https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema">configure analyzers in xml</a></p>
+<ul>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">+ analyzers
-        + default
-            + charFilters (nt:unstructured) //The filters needs to be ordered
-                + HTMLStrip
-                + Mapping
-            + tokenizer
-                - name = &quot;Standard&quot;
-            + filters (nt:unstructured) //The filters needs to be ordered
-                + LowerCase
-                + Stop
-                    - words = &quot;stop1.txt, stop2.txt&quot;
-                    + stop1.txt (nt:file)
-                    + stop2.txt (nt:file)
-                + PorterStem
-                + Synonym
-                    - synonyms = &quot;synonym.txt&quot;
-                    + synonym.txt (nt:file)
-</pre></div></div>
+<li>analyzers + default + charFilters (nt:unstructured) //The filters needs to be ordered + HTMLStrip + Mapping + tokenizer - name = &#x201c;Standard&#x201d; + filters (nt:unstructured) //The filters needs to be ordered + LowerCase + Stop - words = &#x201c;stop1.txt, stop2.txt&#x201d; + stop1.txt (nt:file) + stop2.txt (nt:file) + PorterStem + Synonym - synonyms = &#x201c;synonym.txt&#x201d; + synonym.txt (nt:file)</li>
+</ul>
 <p>Points to note</p>
-
 <ol style="list-style-type: decimal">
-  
-<li>Name of filters, charFilters and tokenizer are formed by removing the  factory suffixes. So
-  
+
+<li>Name of filters, charFilters and tokenizer are formed by removing the factory suffixes. So
 <ul>
-    
+
 <li>org.apache.lucene.analysis.standard.StandardTokenizerFactory -&gt; <tt>Standard</tt></li>
-    
 <li>org.apache.lucene.analysis.charfilter.MappingCharFilterFactory -&gt; <tt>Mapping</tt></li>
-    
 <li>org.apache.lucene.analysis.core.StopFilterFactory -&gt; <tt>Stop</tt></li>
-  </ul></li>
-  
-<li>Any config parameter required for the factory is specified as property of  that node
-  
-<ul>
-    
-<li>If the factory requires to load a file e.g. stop words from some file then  file content can be provided via creating child <tt>nt:file</tt> node of the  filename</li>
-    
-<li>The property value MUST be of type <tt>String</tt>. No other JCR type should be used  for them like array or integer etc</li>
-  </ul></li>
-  
-<li>The analyzer-chain processes text from nodes as well text passed in query. So,  do take care that any mapping configuration (e.g. synonym mappings) factor in  the chain of analyzers.  E.g a common mistake for synonym mapping would be to have <tt>domain =&gt; Range</tt> while  there&#x2019;s a lower case filter configured as well (see the example above). For such  a setup an indexed value <tt>domain</tt> would actually get indexed as <tt>Range</tt> (mapped  value doesn&#x2019;t have lower case filter below it) but a query for <tt>Range</tt> would actually  query for <tt>range</tt> (due to lower case filter) and won&#x2019;t give the result (as might be  expected). An easy work-around for this example could be to have lower case mappings  i.e. just use <tt>domain =&gt; range</tt>.</li>
-  
-<li>Precedence: Specifying analyzer class directly has precedence over analyzer configuration  by composition. If you want to configure analyzers by composition then analyzer class  MUST NOT not be specified. In-build analyzer has least precedence and comes into play only  if no custom analyzer has been configured. Similarly, setting <tt>indexOriginalTerm</tt> on  analyzers node to modify behavior of in-built analyzer also works only when no custom  analyzer has been configured.</li>
-  
+</ul>
+</li>
+<li>Any config parameter required for the factory is specified as property of that node
+<ul>
+
+<li>If the factory requires to load a file e.g. stop words from some file then file content can be provided via creating child <tt>nt:file</tt> node of the filename</li>
+<li>The property value MUST be of type <tt>String</tt>. No other JCR type should be used for them like array or integer etc</li>
+</ul>
+</li>
+<li>The analyzer-chain processes text from nodes as well text passed in query. So, do take care that any mapping configuration (e.g. synonym mappings) factor in the chain of analyzers. E.g a common mistake for synonym mapping would be to have <tt>domain =&gt; Range</tt> while there&#x2019;s a lower case filter configured as well (see the example above). For such a setup an indexed value <tt>domain</tt> would actually get indexed as <tt>Range</tt> (mapped value doesn&#x2019;t have lower case filter below it) but a query for <tt>Range</tt> would actually query for <tt>range</tt> (due to lower case filter) and won&#x2019;t give the result (as might be expected). An easy work-around for this example could be to have lower case mappings i.e. just use <tt>domain =&gt; range</tt>.</li>
+<li>Precedence: Specifying analyzer class directly has precedence over analyzer configuration by composition. If you want to configure analyzers by composition then analyzer class MUST NOT not be specified. In-build analyzer has least precedence and comes into play only if no custom analyzer has been configured. Similarly, setting <tt>indexOriginalTerm</tt> on analyzers node to modify behavior of in-built analyzer also works only when no custom analyzer has been configured.</li>
 <li>To determine list of supported factories have a look at Lucene javadocs for
-  
 <ul>
-    
+
 <li><a class="externalLink" href="https://lucene.apache.org/core/4_7_1/analyzers-common/org/apache/lucene/analysis/util/TokenizerFactory.html">TokenizerFactory</a></li>
-    
 <li><a class="externalLink" href="https://lucene.apache.org/core/4_7_1/analyzers-common/org/apache/lucene/analysis/util/CharFilterFactory.html">CharFilterFactory</a></li>
-    
 <li><a class="externalLink" href="https://lucene.apache.org/core/4_7_1/analyzers-common/org/apache/lucene/analysis/util/TokenFilterFactory.html">FilterFactory</a></li>
-  </ul></li>
-  
+</ul>
+</li>
 <li>Oak support for composing analyzer is based on Lucene. So some helpful docs around this
-  
 <ul>
-    
+
 <li><a class="externalLink" href="https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers%2C+Tokenizers%2C+and+Filters">https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers%2C+Tokenizers%2C+and+Filters</a></li>
-    
 <li><a class="externalLink" href="https://cwiki.apache.org/confluence/display/solr/CharFilterFactories">https://cwiki.apache.org/confluence/display/solr/CharFilterFactories</a></li>
-    
 <li><a class="externalLink" href="https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema">https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema</a></li>
-  </ul></li>
-  
+</ul>
+</li>
 <li>When defining synonyms:
-  
 <ul>
-    
+
 <li>in the synonym file, lines like <i>plane, airplane, aircraft</i> refer to tokens that are mutual synoyms whereas lines like <i>plane =&gt; airplane</i> refer to <i>one way</i> synonyms, so that plane will be expanded to airplane but not vice versa</li>
-    
 <li>continuing with the point above, since oak would use the same analyzer for indexing as well as querying, using one-way synonyms in any practical way is not supported at the moment.</li>
-    
 <li>special characters have to be escaped</li>
-    
 <li>multi word synonyms need particular attention (see <a class="externalLink" href="https://lucidworks.com/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter">https://lucidworks.com/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter</a>)</li>
-  </ul></li>
+</ul>
+</li>
 </ol>
-<p>Note that currently only one analyzer can be configured per index. Its not possible to specify separate analyzer for query and index time currently. </p></div></div>
+<p>Note that currently only one analyzer can be configured per index. Its not possible to specify separate analyzer for query and index time currently.</p></div></div>
 <div class="section">
 <h4><a name="Codec"></a><a name="codec"></a>Codec</h4>
 <p>Name of <a class="externalLink" href="https://lucene.apache.org/core/4_7_1/core/org/apache/lucene/codecs/Codec.html">Lucene Codec</a> to use. By default if the index involves fulltext indexing then Oak Lucene uses <tt>OakCodec</tt> which disables compression. Due to this the index size may grow large. To enable compression you can set the codec to <tt>Lucene46</tt></p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/assetType
+<div>
+<div>
+<pre class="source">/oak:index/assetType
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
   - codec = &quot;Lucene46&quot;
 </pre></div></div>
+
 <p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2853">OAK-2853</a> for details. Enabling the <tt>Lucene46</tt> codec would lead to smaller and compact indexes.</p></div>
 <div class="section">
 <h4><a name="Boost_and_Search_Relevancy"></a><a name="boost"></a>Boost and Search Relevancy</h4>
@@ -990,8 +978,9 @@
 <p>For that to work ensure that for each such property (which need to be preferred) both <tt>nodeScopeIndex</tt> and <tt>analyzed</tt> are set to true. In addition you can specify <tt>boost</tt> property so give higher weightage to values found in specific property</p>
 <p>Note that even without setting explicit <tt>boost</tt> and just setting <tt>nodeScopeIndex</tt> and <tt>analyzed</tt> to true would improve the search result due to the way <a class="externalLink" href="https://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_make_sure_that_a_match_in_a_document_title_has_greater_weight_than_a_match_in_a_document_body.3F">Lucene does scoring</a>. Internally Oak would create separate Lucene fields for those jcr properties and would perform a search across all such fields. For more details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3367">OAK-3367</a></p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">  + indexRules
+<div>
+<div>
+<pre class="source">  + indexRules
     - jcr:primaryType = &quot;nt:unstructured&quot;
     + app:Asset
       + properties
@@ -1006,15 +995,18 @@
           - name = &quot;jcr:content/metadata/jcr:title&quot;
           - boost = 2.0
 </pre></div></div>
+
 <p>With above index config a search like</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">SELECT
+<div>
+<div>
+<pre class="source">SELECT
   *
 FROM [app:Asset] 
 WHERE 
   CONTAINS(., 'Batman')
 </pre></div></div>
+
 <p>Would have those node (of type app:Asset) come first where <i>Batman</i> is found in <i>jcr:title</i>. While those nodes where search text is found in other field like aggregated content would come later</p></div>
 <div class="section">
 <h4><a name="Effective_Index_Definition"></a><a name="stored-index-definition"></a>Effective Index Definition</h4>
@@ -1022,20 +1014,19 @@ WHERE
 <p>Prior to Oak 1.6 index definition as defined in content was directly used for query execution and indexing. It was possible that index definition is modified in incompatible way and that would start affecting the query execution leading to inconsistent result.</p>
 <p>Since Oak 1.6 the index definitions are cloned upon reindexing and stored in a hidden structure. For further incremental indexing and for query plan calculation the stored index definition is used. So any changes done post reindex to index definition would not be applicable untill a reindex is done.</p>
 <p>There would be some cases where changes in index definition does not require a reindex. For e.g. if a new property is being introduced in content model and no prior content exist with such a property then its safe to index such a property without doing a reindex. For such cases user must follow below steps</p>
-
 <ol style="list-style-type: decimal">
-  
+
 <li>Make the required changes</li>
-  
 <li>Set <tt>refresh</tt> property to <tt>true</tt> in index definition node</li>
-  
 <li>Save the changes</li>
 </ol>
 <p>On next async indexing cycle this flag would be pickedup and stored index definition would be refreshed. <i>Post this the flag would be automatically removed and a log message would be logged</i>. You would also see a log message like below</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">LuceneIndexEditorContext - Refreshed the index definition for [/oak:index/fooLuceneIndex] 
+<div>
+<div>
+<pre class="source">LuceneIndexEditorContext - Refreshed the index definition for [/oak:index/fooLuceneIndex] 
 </pre></div></div>
+
 <p>To simplify troubleshooting the stored index definition can be accessed from <tt>LuceneIndexMBean</tt> via <tt>getStoredIndexDefinition</tt> operation. It would dump the string representation of stored NodeState</p>
 <p><img src="lucene-index-mbean-dump-index.png" alt="Dump Stored Index Definition" /></p>
 <p>This feature can be disabled by setting OSGi property <tt>disableStoredIndexDefinition</tt> for <tt>LuceneIndexProviderService</tt> to true. Once disable any change in index definition would start effecting the query plans</p>
@@ -1050,11 +1041,10 @@ WHERE
 <p>Refer to <a href="indexing.html#nrt-indexing">Near realtime indexing</a> for more details</p></div>
 <div class="section">
 <h3><a name="LuceneIndexProvider_Configuration"></a><a name="osgi-config"></a>LuceneIndexProvider Configuration</h3>
-<p>Some of the runtime aspects of the Oak Lucene support can be configured via OSGi configuration. The configuration needs to be done for PID <tt>org.apache
-.jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService</tt></p>
+<p>Some of the runtime aspects of the Oak Lucene support can be configured via OSGi configuration. The configuration needs to be done for PID <tt>org.apache .jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService</tt></p>
 <p><img src="lucene-osgi-config.png" alt="OSGi Configuration" /></p>
-
 <dl>
+
 <dt>enableCopyOnReadSupport</dt>
 <dd>Enable copying of Lucene index to local file system to improve query performance. See <a href="#copy-on-read">Copy Indexes On Read</a></dd>
 <dt>enableCopyOnWriteSupport</dt>
@@ -1071,83 +1061,118 @@ WHERE
 <h3><a name="Tika_Config"></a><a name="tika-config"></a>Tika Config</h3>
 <p><tt>@since Oak 1.0.12, 1.2.3</tt></p>
 <p>Oak Lucene uses <a class="externalLink" href="http://tika.apache.org/">Apache Tika</a> to extract the text from binary content</p>
+<ul>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">+ tika
-    - maxExtractLength (long) = -10
-    + config.xml  (nt:file)
-      + jcr:content
-        - jcr:data = //config xml binary content
-</pre></div></div>
-<p>Oak uses a <a class="externalLink" href="https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/resources/org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml">default config</a>. To use a custom config specify the config file via <tt>tika/config.xml</tt> node in index config. </p>
+<li>tika
+<ul>
+
+<li>maxExtractLength (long) = -10
+<ul>
 
+<li>config.xml  (nt:file)</li>
+<li>jcr:content
+<ul>
+
+<li>jcr:data = //config xml binary content</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+<p>Oak uses a <a class="externalLink" href="https://github.com/apache/jackrabbit-oak/blob/trunk/oak-lucene/src/main/resources/org/apache/jackrabbit/oak/plugins/index/lucene/tika-config.xml">default config</a>. To use a custom config specify the config file via <tt>tika/config.xml</tt> node in index config.</p>
 <dl>
+
 <dt><a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2470">maxExtractLength</a></dt>
-<dd>Limits the number of characters that are extracted by the Tika parse. A negative  value indicates a multiple of <tt>maxFieldLength</tt> and a positive value is used as is
-  
+<dd>Limits the number of characters that are extracted by the Tika parse. A negative value indicates a multiple of <tt>maxFieldLength</tt> and a positive value is used as is
 <ul>
-    
+
 <li>maxExtractLength = -10, maxFieldLength = 10000 -&gt; Actual value = 100000</li>
-    
 <li>maxExtractLength = 1000 -&gt; Actual value = 1000</li>
-  </ul></dd>
+</ul>
+</dd>
 </dl>
 <div class="section">
 <h4><a name="Mime_type_usage"></a><a name="mime-type-usage"></a>Mime type usage</h4>
-<p>A binary would only be index if there is an associated property <tt>jcr:mimeType</tt> defined and that is supported by Tika. By default indexer uses <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2895">TypeDetector</a> instead of default <tt>DefaultDetector</tt> which relies on the <tt>jcr:mimeType</tt> to pick up the right parser. </p></div>
+<p>A binary would only be index if there is an associated property <tt>jcr:mimeType</tt> defined and that is supported by Tika. By default indexer uses <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2895">TypeDetector</a> instead of default <tt>DefaultDetector</tt> which relies on the <tt>jcr:mimeType</tt> to pick up the right parser.</p></div>
 <div class="section">
 <h4><a name="Mime_type_mapping"></a><a name="mime-type-mapping"></a>Mime type mapping</h4>
 <p><tt>@since Oak 1.7.7</tt></p>
 <p>In certain circumstances, it may be desired to pass a value other than the <tt>jcr:mimeType</tt> property into the Tika parser. For example, this would be necessary if a binary has an application-specific mime type, but is parsable by the standard Tika parser for some generic type. To support these cases, create a node structure under the <tt>tika/mimeTypes</tt> node following the mime type structure, e.g.</p>
+<ul>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">+ tika
-    + mimeTypes (nt:unstructured)
-      + application (nt:unstructured)
-        + vnd.mycompany-document (nt:unstructured)
-          - mappedType = application/pdf
-</pre></div></div>
+<li>tika
+<ul>
+
+<li>mimeTypes (nt:unstructured)
+<ul>
+
+<li>application (nt:unstructured)
+<ul>
+
+<li>vnd.mycompany-document (nt:unstructured)</li>
+<li>mappedType = application/pdf</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
 <p>When this index is indexing a binary of type <tt>application/vnd.mycompany-document</tt> it will force Tika to treat it as a binary of type <tt>application/pdf</tt>.</p></div></div>
 <div class="section">
 <h3><a name="Non_Root_Index_Definitions"></a><a name="non-root-index"></a>Non Root Index Definitions</h3>
 <p>Lucene index definition can be defined at any location in repository and need not always be defined at root. For example if your query involves path restrictions like</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">select * from [app:Asset] as a where ISDESCENDANTNODE(a, '/content/companya') and [format] = 'image'
+<div>
+<div>
+<pre class="source">select * from [app:Asset] as a where ISDESCENDANTNODE(a, '/content/companya') and [format] = 'image'
 </pre></div></div>
+
 <p>Then you can create the required index definition say <tt>assetIndex</tt> at <tt>/content/companya/oak:index/assetIndex</tt>. In such a case that index would contain data for the subtree under <tt>/content/companya</tt></p></div>
 <div class="section">
 <h3><a name="Native_Query_and_Index_Selection"></a><a name="native-query"></a>Native Query and Index Selection</h3>
 <p>Oak query engine supports native queries like</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">//*[rep:native('lucene', 'name:(Hello OR World)')]
+<div>
+<div>
+<pre class="source">//*[rep:native('lucene', 'name:(Hello OR World)')]
 </pre></div></div>
-<p>If multiple Lucene based indexes are enabled on the system and you need to make use of specific Lucene index like <tt>/oak:index/assetIndex</tt> then you can specify the index name via <tt>functionName</tt> attribute on index definition. </p>
-<p>For example for assetIndex definition like </p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">- jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
-- type = &quot;lucene&quot;
-...
-- functionName = &quot;lucene-assetIndex&quot;
-</pre></div></div>
+<p>If multiple Lucene based indexes are enabled on the system and you need to make use of specific Lucene index like <tt>/oak:index/assetIndex</tt> then you can specify the index name via <tt>functionName</tt> attribute on index definition.</p>
+<p>For example for assetIndex definition like</p>
+<ul>
+
+<li>jcr:primaryType = &#x201c;oak:QueryIndexDefinition&#x201d;
+<ul>
+
+<li>type = &#x201c;lucene&#x201d; &#x2026;</li>
+<li>functionName = &#x201c;lucene-assetIndex&#x201d;</li>
+</ul>
+</li>
+</ul>
 <p>Executing following query would ensure that Lucene index from <tt>assetIndex</tt> should be used</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">//*[rep:native('lucene-assetIndex', 'name:(Hello OR World)')]
-</pre></div></div></div>
+<div>
+<div>
+<pre class="source">//*[rep:native('lucene-assetIndex', 'name:(Hello OR World)')]
+</pre></div></div>
+</div>
 <div class="section">
 <h3><a name="Persisting_indexes_to_FileSystem"></a><a name="native-query"></a>Persisting indexes to FileSystem</h3>
 <p>By default Lucene indexes are stored in the <tt>NodeStore</tt>. If required they can be stored on the file system directly</p>
+<ul>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">- jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
-- type = &quot;lucene&quot;
-...
-- persistence = &quot;file&quot;
-- path = &quot;/path/to/store/index&quot;
-</pre></div></div>
+<li>jcr:primaryType = &#x201c;oak:QueryIndexDefinition&#x201d;
+<ul>
+
+<li>type = &#x201c;lucene&#x201d; &#x2026;</li>
+<li>persistence = &#x201c;file&#x201d;</li>
+<li>path = &#x201c;/path/to/store/index&#x201d;</li>
+</ul>
+</li>
+</ul>
 <p>To store the Lucene index in the file system, in the Lucene index definition node, set the property <tt>persistence</tt> to <tt>file</tt>, and set the property <tt>path</tt> to the directory where the index should be stored. Then start reindexing by setting <tt>reindex</tt> to <tt>true</tt>.</p>
 <p>Note that this setup would only for those non cluster <tt>NodeStore</tt>. If the backend <tt>NodeStore</tt> supports clustering then index data would not be accessible on other cluster nodes</p></div>
 <div class="section">
@@ -1162,7 +1187,7 @@ WHERE
 <p><tt>@since Oak 1.0.15, 1.2.3</tt></p>
 <p>Similar to <i>CopyOnRead</i> feature Oak Lucene also supports <i>CopyOnWrite</i> to enable faster indexing by first buffering the writes to local filesystem and transferring them to remote storage asynchronously as the indexing proceeds. This should provide better performance and hence faster indexing times.</p>
 <p><b>indexPath</b></p>
-<p><i>Not required from Oak 1.6 , 1.4.7+</i> </p>
+<p><i>Not required from Oak 1.6 , 1.4.7+</i></p>
 <p>To speed up the indexing with CopyOnWrite you would also need to set <tt>indexPath</tt> in index definition to the path of index in the repository. For e.g. if your index is defined at <tt>/oak:index/lucene</tt> then value of <tt>indexPath</tt> should be set to <tt>/oak:index/lucene</tt>. This would enable the indexer to perform any read during the indexing process locally and thus avoid costly read from remote.</p>
 <p>For more details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2247">OAK-2247</a>. This feature can be enabled via <a href="#osgi-config">Lucene Index provider service configuration</a></p></div>
 <div class="section">
@@ -1177,22 +1202,25 @@ WHERE
 <p>The feature would only delete blobs which have been deleted before a certain time. The task to actually purge blobs from datastore is performed by jmx operation. Jmx bean for the operation is <tt>org.apache.jackrabbit.oak:name=Active lucene files collection,type=ActiveDeletedBlobCollector</tt> and the operation is <tt>startActiveCollection()</tt>.</p></div>
 <div class="section">
 <h3><a name="Analyzing_created_Lucene_Index"></a><a name="luke"></a>Analyzing created Lucene Index</h3>
-<p><a class="externalLink" href="https://code.google.com/p/luke/">Luke</a> is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display index details. In Oak Lucene index files are stored in <tt>NodeStore</tt> and hence not directly accessible. To enable analyzing the index files via Luke follow below mentioned steps</p>
-
+<p><a class="externalLink" href="https://code.google.com/p/luke/">Luke</a>  is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display index details. In Oak Lucene index files are stored in <tt>NodeStore</tt> and hence not directly accessible. To enable analyzing the index files via Luke follow below mentioned steps</p>
 <ol style="list-style-type: decimal">
-  
+
+<li>
+
+<p>Download the Luke version which includes the matching Lucene jars used by Oak. As of Oak 1.0.8 release the Lucene version used is 4.7.1. So download the jar from <a class="externalLink" href="https://github.com/DmitryKey/luke/releases">here</a></p>
+
+<div>
+<div>
+<pre class="source">$wget https://github.com/DmitryKey/luke/releases/download/4.7.0/luke-with-deps.jar
+</pre></div></div>
+</li>
 <li>
-<p>Download the Luke version which includes the matching Lucene jars used by  Oak. As of Oak 1.0.8 release the Lucene version used is 4.7.1. So download the jar from <a class="externalLink" href="https://github.com/DmitryKey/luke/releases">here</a></p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">$wget https://github.com/DmitryKey/luke/releases/download/4.7.0/luke-with-deps.jar
-</pre></div></div></li>
-  
-<li>
-<p>Use the <a class="externalLink" href="https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#console">Oak Console</a> to dump the Lucene index from <tt>NodeStore</tt>  to filesystem directory. Use the <tt>lc dump</tt> command</p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">$ java -jar oak-run-*.jar console /path/to/oak/repository
+
+<p>Use the <a class="externalLink" href="https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#console">Oak Console</a> to dump the Lucene index from <tt>NodeStore</tt> to filesystem directory. Use the <tt>lc dump</tt> command</p>
+
+<div>
+<div>
+<pre class="source">$ java -jar oak-run-*.jar console /path/to/oak/repository
 Apache Jackrabbit Oak 1.1-SNAPSHOT
 Jackrabbit Oak Shell (Apache Jackrabbit Oak 1.1-SNAPSHOT, JVM: 1.7.0_55)
 Type ':help' or ':h' for help.
@@ -1210,14 +1238,17 @@ Copied 74.1 MB in 1.209 s
 Copying Lucene indexes to [/path/to/dump/index/lucene-index/slingAlias]
 Copied 8.5 MB in 218.7 ms
 /&gt;
-</pre></div></div></li>
-  
+</pre></div></div>
+</li>
 <li>
-<p>Post dump open the index via Luke. Oak Lucene uses a <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-1737">custom  Codec</a>. So oak-lucene jar needs to be included in Luke classpath  for it to display the index details</p>
-  
-<div class="source">
-<div class="source"><pre class="prettyprint">$ java -XX:MaxPermSize=512m -cp luke-with-deps.jar:oak-lucene-1.0.8.jar org.getopt.luke.Luke
-</pre></div></div></li>
+
+<p>Post dump open the index via Luke. Oak Lucene uses a <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-1737">custom Codec</a>. So oak-lucene jar needs to be included in Luke classpath for it to display the index details</p>
+
+<div>
+<div>
+<pre class="source">$ java -XX:MaxPermSize=512m -cp luke-with-deps.jar:oak-lucene-1.0.8.jar org.getopt.luke.Luke
+</pre></div></div>
+</li>
 </ol>
 <p>From the Luke UI shown you can access various details.</p></div>
 <div class="section">
@@ -1232,8 +1263,9 @@ Copied 8.5 MB in 218.7 ms
 <p>Once the above configuration has been done, by default, the Lucene suggester is updated every 10 minutes but that can be changed by setting the property <tt>suggestUpdateFrequencyMinutes</tt> in <tt>suggestion</tt> node under the index definition node to a different value. <i>Note that up till Oak 1.3.14/1.2.14, <tt>suggestUpdateFrequencyMinutes</tt> was to be setup at index definition node itself. That is is still supported for backward compatibility, but having a separate <tt>suggestion</tt> node is preferred.</i></p>
 <p>Sample configuration for suggestions based on terms contained in <tt>jcr:description</tt> property.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/lucene-suggest
+<div>
+<div>
+<pre class="source">/oak:index/lucene-suggest
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
@@ -1250,10 +1282,12 @@ Copied 8.5 MB in 218.7 ms
           - analyzed = true
           - useInSuggest = true
 </pre></div></div>
-<p><tt>@since Oak 1.3.12, 1.2.14</tt> the index Analyzer can be used to perform a have more fine grained suggestions, e.g. single words (whereas default suggest configuration returns entire property values, see [OAK-3407]: <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3407)">https://issues.apache.org/jira/browse/OAK-3407)</a>. Analyzed suggestions can be enabled by setting &#x201c;suggestAnalyzed&#x201d; property to true, e.g.:</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/lucene-suggest
+<p><tt>@since Oak 1.3.12, 1.2.14</tt> the index Analyzer can be used to perform a have more fine grained suggestions, e.g. single words (whereas default suggest configuration returns entire property values, see [OAK-3407]: <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3407">https://issues.apache.org/jira/browse/OAK-3407</a>). Analyzed suggestions can be enabled by setting &#x201c;suggestAnalyzed&#x201d; property to true, e.g.:</p>
+
+<div>
+<div>
+<pre class="source">/oak:index/lucene-suggest
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
@@ -1262,13 +1296,24 @@ Copied 8.5 MB in 218.7 ms
     - suggestUpdateFrequencyMinutes = 20
     - suggestAnalyzed = true
 </pre></div></div>
+
 <p><i>Note that up till Oak 1.3.14/1.2.14, <tt>suggestAnalyzed</tt> was to be setup at index definition node itself. That is is still supported for backward compatibility, but having a separate <tt>suggestion</tt> node is preferred.</i></p>
 <p>Setting up <tt>useInSuggest=true</tt> for a property definition having <tt>name=:nodeName</tt> would add node names to suggestion dictionary (See <a href="#property-names">property name</a> for node name indexing)</p>
-<p>Since, Oak 1.3.16/1.2.14, very little support exists for queries with <tt>ISDESCENDANTNODE</tt> constraint to subset suggestions on a sub-tree. It requires <tt>evaluatePathRestrictions=true</tt> on index definition. e.g. <tt>
-SELECT rep:suggest() FROM [nt:base] WHERE SUGGEST('test') AND ISDESCENDANTNODE('/a/b')
-</tt> or <tt>
-/jcr:root/a/b//[rep:suggest('in 201')]/(rep:suggest())
-</tt> Note, the subset is done by filtering top 10 suggestions. So, it&#x2019;s possible to get no suggestions for a subtree query, if top 10 suggestions are not part of that subtree. For details look at <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3994">OAK-3994</a> and related issues.</p></div>
+<p>Since, Oak 1.3.16/1.2.14, very little support exists for queries with <tt>ISDESCENDANTNODE</tt> constraint to subset suggestions on a sub-tree.  It requires <tt>evaluatePathRestrictions=true</tt> on index definition. e.g.</p>
+
+<div>
+<div>
+<pre class="source">SELECT rep:suggest() FROM [nt:base] WHERE SUGGEST('test') AND ISDESCENDANTNODE('/a/b')
+</pre></div></div>
+
+<p>or</p>
+
+<div>
+<div>
+<pre class="source">/jcr:root/a/b//[rep:suggest('in 201')]/(rep:suggest())
+</pre></div></div>
+
+<p>Note, the subset is done by filtering top 10 suggestions. So, it&#x2019;s possible to get no suggestions for a subtree query, if top 10 suggestions are not part of that subtree. For details look at <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3994">OAK-3994</a> and related issues.</p></div>
 <div class="section">
 <h4><a name="Spellchecking"></a><a name="spellchecking"></a>Spellchecking</h4>
 <p><tt>@since Oak 1.1.17, 1.0.13</tt></p>
@@ -1276,8 +1321,9 @@ SELECT rep:suggest() FROM [nt:base] WHER
 <p>Sample configuration for spellchecking based on terms contained in <tt>jcr:title</tt> property.</p>
 <p>Since Oak 1.3.11/1.2.14, the each suggestion would be returned per row.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/lucene-spellcheck
+<div>
+<div>
+<pre class="source">/oak:index/lucene-spellcheck
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
@@ -1292,18 +1338,30 @@ SELECT rep:suggest() FROM [nt:base] WHER
           - analyzed = true
           - useInSpellcheck = true
 </pre></div></div>
-<p>Since, Oak 1.3.16/1.2.14, very little support exists for queries with <tt>ISDESCENDANTNODE</tt> constraint to subset suggestions on a sub-tree. It requires <tt>evaluatePathRestrictions=true</tt> on index definition. e.g. <tt>
-SELECT rep:suggest() FROM [nt:base] WHERE SUGGEST('test') AND ISDESCENDANTNODE('/a/b')
-</tt> or <tt>
-/jcr:root/a/b//[rep:suggest('in 201')]/(rep:suggest())
-</tt> Note, the subset is done by filtering top 10 spellchecks. So, it&#x2019;s possible to get no results for a subtree query, if top 10 spellchecks are not part of that subtree. For details look at <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3994">OAK-3994</a> and related issues.</p></div>
+
+<p>Since, Oak 1.3.16/1.2.14, very little support exists for queries with <tt>ISDESCENDANTNODE</tt> constraint to subset suggestions on a sub-tree. It requires <tt>evaluatePathRestrictions=true</tt> on index definition. e.g.</p>
+
+<div>
+<div>
+<pre class="source">SELECT rep:suggest() FROM [nt:base] WHERE SUGGEST('test') AND ISDESCENDANTNODE('/a/b')
+</pre></div></div>
+
+<p>or</p>
+
+<div>
+<div>
+<pre class="source">/jcr:root/a/b//[rep:suggest('in 201')]/(rep:suggest())
+</pre></div></div>
+
+<p>Note, the subset is done by filtering top 10 spellchecks. So, it&#x2019;s possible to get no results for a subtree query, if top 10 spellchecks are not part of that subtree. For details look at <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3994">OAK-3994</a> and related issues.</p></div>
 <div class="section">
 <h4><a name="Facets"></a><a name="facets"></a>Facets</h4>
 <p><tt>@since Oak 1.3.14</tt></p>
-<p>Lucene property indexes can also be used for retrieving facets, in order to do so the property <i>facets</i> must be set to  <i>true</i> on the property definition.</p>
+<p>Lucene property indexes can also be used for retrieving facets, in order to do so the property <i>facets</i> must be set to <i>true</i> on the property definition.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/lucene-with-facets
+<div>
+<div>
+<pre class="source">/oak:index/lucene-with-facets
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
@@ -1317,10 +1375,12 @@ SELECT rep:suggest() FROM [nt:base] WHER
           - facets = true
           - propertyIndex = true
 </pre></div></div>
-<p>Specific facet related features for Lucene property index can be configured in a separate <i>facets</i> node below the  index definition.  By default ACL checks are always performed on facets by the Lucene property index however this can be avoided by setting  the property <i>secure</i> to <i>false</i> in the <i>facets</i> configuration node. <tt>@since Oak 1.5.15</tt> The no. of facets to be retrieved is configurable via the <i>topChildren</i> property, which defaults to 10.</p>
 
-<div class="source">
-<div class="source"><pre class="prettyprint">/oak:index/lucene-with-unsecure-facets
+<p>Specific facet related features for Lucene property index can be configured in a separate <i>facets</i> node below the index definition. By default ACL checks are always performed on facets by the Lucene property index however this can be avoided by setting the property <i>secure</i> to <i>false</i> in the <i>facets</i> configuration node. <tt>@since Oak 1.5.15</tt> The no. of facets to be retrieved is configurable via the <i>topChildren</i> property, which defaults to 10.</p>
+
+<div>
+<div>
+<pre class="source">/oak:index/lucene-with-unsecure-facets
   - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
   - compatVersion = 2
   - type = &quot;lucene&quot;
@@ -1336,7 +1396,8 @@ SELECT rep:suggest() FROM [nt:base] WHER
         + jcr:title
           - facets = true
           - propertyIndex = true
-</pre></div></div></div>
+</pre></div></div>
+</div>
 <div class="section">
 <h4><a name="Score_Explanation"></a><a name="score-explanation"></a>Score Explanation</h4>
 <p><tt>@since Oak 1.3.12</tt></p>
@@ -1349,50 +1410,60 @@ SELECT rep:suggest() FROM [nt:base] WHER
 <div class="section">
 <h3><a name="Design_Considerations"></a><a name="design-considerations"></a>Design Considerations</h3>
 <p>Lucene index provides quite a few features to meet various query requirements. While defining the index definition do consider the following aspects</p>
-
 <ol style="list-style-type: decimal">
-  
+
 <li>
-<p>If query uses different path restrictions keeping other restrictions same then make use of <tt>evaluatePathRestrictions</tt></p></li>
-  
+
+<p>If query uses different path restrictions keeping other restrictions same then make use of <tt>evaluatePathRestrictions</tt></p>
+</li>
 <li>
-<p>If query performs sorting then have an explicit property definition for the property on which sorting is being performed and set <tt>ordered</tt> to true for that property</p></li>
-  
+
+<p>If query performs sorting then have an explicit property definition for the property on which sorting is being performed and set <tt>ordered</tt> to true for that property</p>
+</li>
 <li>
-<p>If the query is based on specific nodeType then define <tt>indexRules</tt> for that nodeType</p></li>
-  
+
+<p>If the query is based on specific nodeType then define <tt>indexRules</tt> for that nodeType</p>
+</li>
 <li>
-<p>Aim for a precise index configuration which indexes just the right amount of content based on your query requirement. An index which is precise would be smaller and would perform better.</p></li>
-  
+
+<p>Aim for a precise index configuration which indexes just the right amount of content based on your query requirement. An index which is precise would be smaller and would perform better.</p>
+</li>
 <li>
-<p><b>Make use of nodetype to achieve a <i>cohesive</i> index</b>. This would allow multiple queries to make use of same index and also evaluation of multiple property restrictions natively in Lucene</p></li>
-  
+
+<p><b>Make use of nodetype to achieve a <i>cohesive</i> index</b>. This would allow multiple queries to make use of same index and also evaluation of multiple property restrictions natively in Lucene</p>
+</li>
 <li>
-<p><b><a href="#non-root-index">Non root indexes</a></b> - If your query always perform search under certain paths then create index definition under those paths only. This might be helpful in multi tenant deployment where each tenant data is stored under specific repository path and all queries are made under those path. </p>
-<p>In fact its recommended to use single index if all the properties being indexed are related. This would enable Lucene index to evaluate as much property restriction as possible natively (which is faster) and also save on storage cost incurred in storing the node path.</p></li>
-  
+
+<p><b><a href="#non-root-index">Non root indexes</a></b> - If your query always perform search under certain paths then create index definition under those paths only. This might be helpful in multi tenant deployment where each tenant data is stored under specific repository path and all queries are made under those path.</p>
+<p>In fact its recommended to use single index if all the properties being indexed are related. This would enable Lucene index to evaluate as much property restriction as possible  natively (which is faster) and also save on storage cost incurred in storing the node path.</p>
+</li>
 <li>
-<p>Use features when required - There are certain features provided by Lucene index which incur extra cost in terms of storage space when enabled. For example enabling <tt>evaluatePathRestrictions</tt>, <tt>ordering</tt> etc. Enable such option only when you make use of those features and further enable them for only those properties. So <tt>ordering</tt> should be enabled only when sorting is being performed for those properties and <tt>evaluatePathRestrictions</tt> should only be enabled if you are going to specify path restrictions.</p></li>
-  
+
+<p>Use features when required - There are certain features provided by Lucene index  which incur extra cost in terms of storage space when enabled. For example enabling <tt>evaluatePathRestrictions</tt>, <tt>ordering</tt> etc. Enable such option only when you make use of those features and further enable them for only those properties. So <tt>ordering</tt>  should be enabled only when sorting is being performed for those properties and <tt>evaluatePathRestrictions</tt> should only be enabled if you are going to specify path restrictions.</p>
+</li>
 <li>
-<p><b>Avoid overlapping index definition</b> - Do not have overlapping index definition indexing same nodetype but having different <tt>includedPaths</tt> and <tt>excludedPaths</tt>. Index selection logic does not make use of the <tt>includedPaths</tt> and <tt>excludedPaths</tt> for index selection. Index selection is done only on cost basis and <tt>queryPaths</tt>. Having multiple definition for same type would cause ambiguity in index selection and may lead to unexpected results. Instead have a single index definition for same type.</p></li>
+

[... 341 lines stripped ...]


Mime
View raw message