jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chet...@apache.org
Subject svn commit: r1645577 - in /jackrabbit/site/live/oak/docs/query: lucene-old.html lucene.html
Date Mon, 15 Dec 2014 06:30:05 GMT
Author: chetanm
Date: Mon Dec 15 06:30:05 2014
New Revision: 1645577

URL: http://svn.apache.org/r1645577
Log:
OAK-301 : oak docu

Publishing changes done related to new Lucene features

Added:
    jackrabbit/site/live/oak/docs/query/lucene-old.html   (with props)
Modified:
    jackrabbit/site/live/oak/docs/query/lucene.html

Added: jackrabbit/site/live/oak/docs/query/lucene-old.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/lucene-old.html?rev=1645577&view=auto
==============================================================================
--- jackrabbit/site/live/oak/docs/query/lucene-old.html (added)
+++ jackrabbit/site/live/oak/docs/query/lucene-old.html Mon Dec 15 06:30:05 2014
@@ -0,0 +1,794 @@
+<!DOCTYPE html>
+<!--
+ | Generated by Apache Maven Doxia at 2014-12-15
+ | Rendered using Apache Maven Fluido Skin 1.3.0
+-->
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="Date-Revision-yyyymmdd" content="20141215" />
+    <meta http-equiv="Content-Language" content="en" />
+    <title>Jackrabbit Oak - Lucene Index</title>
+    <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
+    <link rel="stylesheet" href="../css/site.css" />
+    <link rel="stylesheet" href="../css/print.css" media="print" />
+
+      
+    <script type="text/javascript" src="../js/apache-maven-fluido-1.3.0.min.js"></script>
+
+    
+            </head>
+        <body class="topBarEnabled">
+          
+    
+    
+            
+    
+    
+    <a href="http://github.com/apache/jackrabbit-oak">
+      <img style="position: absolute; top: 0; right: 0; border: 0; z-index: 10000;"
+        src="https://s3.amazonaws.com/github/ribbons/forkme_right_red_aa0000.png"
+        alt="Fork me on GitHub">
+    </a>
+  
+                
+                    
+                
+
+    <div id="topbar" class="navbar navbar-fixed-top ">
+      <div class="navbar-inner">
+                <div class="container-fluid">
+        <a data-target=".nav-collapse" data-toggle="collapse" class="btn btn-navbar">
+          <span class="icon-bar"></span>
+          <span class="icon-bar"></span>
+          <span class="icon-bar"></span>
+        </a>
+                
+                                                                                <a class="brand" href="../"  title="Oak logo">
+
+                                
+                                                                                                                    <img src="../oak_logo.png" alt="Oak logo" />
+                
+                </a>
+                    
+                                <ul class="nav">
+                          <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Overview <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="../index.html"  title="Jackrabbit Oak">Jackrabbit Oak</a>
+</li>
+                  
+                      <li>      <a href="../license.html"  title="License">License</a>
+</li>
+                  
+                      <li>      <a href="../downloads.html"  title="Downloads">Downloads</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Concepts and Architecture <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="../architecture/overview.html"  title="Overview">Overview</a>
+</li>
+                  
+                      <li>      <a href="../architecture/nodestate.html"  title="The Node State Model">The Node State Model</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Main APIs <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="http://www.day.com/specs/jcr/2.0/index.html"  title="JCR API">JCR API</a>
+</li>
+                  
+                      <li>      <a href="../oak_api/overview.html"  title="Oak API">Oak API</a>
+</li>
+                  
+                      <li>      <a href="../nodestore/overview.html"  title="NodeStore and MicroKernel API">NodeStore and MicroKernel API</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Features and Plugins <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="../query/query.html"  title="Query">Query</a>
+</li>
+                  
+                      <li>      <a href="../security/overview.html"  title="Security">Security</a>
+</li>
+                  
+                      <li>      <a href="../plugins/blobstore.html"  title="BlobStore">BlobStore</a>
+</li>
+                  
+                      <li>      <a href="../clustering.html"  title="Clustering">Clustering</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Using Oak <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="../use_getting_started.html"  title="Getting Started">Getting Started</a>
+</li>
+                  
+                      <li>      <a href="../construct.html"  title="Repository Construction">Repository Construction</a>
+</li>
+                  
+                      <li>      <a href="../osgi_config.html"  title="Configuring Oak">Configuring Oak</a>
+</li>
+                  
+                      <li>      <a href="../command_line.html"  title="Command Line Tools">Command Line Tools</a>
+</li>
+                  
+                      <li>      <a href="../differences.html"  title="Differences to Jackrabbit 2">Differences to Jackrabbit 2</a>
+</li>
+                  
+                      <li>      <a href="../known_issues.html"  title="Known Issues">Known Issues</a>
+</li>
+                  
+                      <li>      <a href="../dos_and_donts.html"  title="Dos and Don'ts">Dos and Don'ts</a>
+</li>
+                  
+                      <li>      <a href="../coldstandby/coldstandby.html"  title="Cold Standby">Cold Standby</a>
+</li>
+                  
+                      <li>      <a href="../FAQ.html"  title="FAQ">FAQ</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Developing Oak <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="../dev_getting_started.html"  title="Getting Started">Getting Started</a>
+</li>
+                  
+                      <li>      <a href="../participating.html"  title="Participating">Participating</a>
+</li>
+                  
+                      <li>      <a href="../apidocs/index.html"  title="API Docs">API Docs</a>
+</li>
+                          </ul>
+      </li>
+                <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Links <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+        
+                      <li>      <a href="http://jackrabbit.apache.org/oak"  title="Apache Jackrabbit Oak">Apache Jackrabbit Oak</a>
+</li>
+                  
+                      <li>      <a href="http://jackrabbit.apache.org/"  title="Apache Jackrabbit">Apache Jackrabbit</a>
+</li>
+                          </ul>
+      </li>
+                  </ul>
+          
+          
+          
+                   
+                      </div>
+          
+        </div>
+      </div>
+    </div>
+    
+        <div class="container-fluid">
+          <div id="banner">
+        <div class="pull-left">
+                                <div id="bannerLeft">
+                <h2>Oak Documentation</h2>
+                </div>
+                      </div>
+        <div class="pull-right">  </div>
+        <div class="clear"><hr/></div>
+      </div>
+
+      <div id="breadcrumbs">
+        <ul class="breadcrumb">
+                
+                    
+                  <li id="publishDate">Last Published: 2014-12-15</li>
+                  <li class="divider">|</li> <li id="projectVersion">Version: 1.1-SNAPSHOT</li>
+                      
+                
+                    
+      
+                            </ul>
+      </div>
+
+            
+      <div class="row-fluid">
+        <div id="leftColumn" class="span3">
+          <div class="well sidebar-nav">
+                
+                    
+                <ul class="nav nav-list">
+                    <li class="nav-header">Overview</li>
+                                
+      <li>
+    
+                          <a href="../index.html" title="Jackrabbit Oak">
+          <i class="none"></i>
+        Jackrabbit Oak</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../license.html" title="License">
+          <i class="none"></i>
+        License</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../downloads.html" title="Downloads">
+          <i class="none"></i>
+        Downloads</a>
+            </li>
+                              <li class="nav-header">Concepts and Architecture</li>
+                                
+      <li>
+    
+                          <a href="../architecture/overview.html" title="Overview">
+          <i class="none"></i>
+        Overview</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../architecture/nodestate.html" title="The Node State Model">
+          <i class="none"></i>
+        The Node State Model</a>
+            </li>
+                              <li class="nav-header">Main APIs</li>
+                                
+      <li>
+    
+                          <a href="http://www.day.com/specs/jcr/2.0/index.html" class="externalLink" title="JCR API">
+          <i class="none"></i>
+        JCR API</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../oak_api/overview.html" title="Oak API">
+          <i class="none"></i>
+        Oak API</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../nodestore/overview.html" title="NodeStore and MicroKernel API">
+          <i class="none"></i>
+        NodeStore and MicroKernel API</a>
+            </li>
+                              <li class="nav-header">Features and Plugins</li>
+                                
+      <li>
+    
+                          <a href="../query/query.html" title="Query">
+          <i class="none"></i>
+        Query</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../security/overview.html" title="Security">
+          <i class="none"></i>
+        Security</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../plugins/blobstore.html" title="BlobStore">
+          <i class="none"></i>
+        BlobStore</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../clustering.html" title="Clustering">
+          <i class="none"></i>
+        Clustering</a>
+            </li>
+                              <li class="nav-header">Using Oak</li>
+                                
+      <li>
+    
+                          <a href="../use_getting_started.html" title="Getting Started">
+          <i class="none"></i>
+        Getting Started</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../construct.html" title="Repository Construction">
+          <i class="none"></i>
+        Repository Construction</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../osgi_config.html" title="Configuring Oak">
+          <i class="none"></i>
+        Configuring Oak</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../command_line.html" title="Command Line Tools">
+          <i class="none"></i>
+        Command Line Tools</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../differences.html" title="Differences to Jackrabbit 2">
+          <i class="none"></i>
+        Differences to Jackrabbit 2</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../known_issues.html" title="Known Issues">
+          <i class="none"></i>
+        Known Issues</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../dos_and_donts.html" title="Dos and Don'ts">
+          <i class="none"></i>
+        Dos and Don'ts</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../coldstandby/coldstandby.html" title="Cold Standby">
+          <i class="none"></i>
+        Cold Standby</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../FAQ.html" title="FAQ">
+          <i class="none"></i>
+        FAQ</a>
+            </li>
+                              <li class="nav-header">Developing Oak</li>
+                                
+      <li>
+    
+                          <a href="../dev_getting_started.html" title="Getting Started">
+          <i class="none"></i>
+        Getting Started</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../participating.html" title="Participating">
+          <i class="none"></i>
+        Participating</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="../apidocs/index.html" title="API Docs">
+          <i class="none"></i>
+        API Docs</a>
+            </li>
+                              <li class="nav-header">Links</li>
+                                
+      <li>
+    
+                          <a href="http://jackrabbit.apache.org/oak" class="externalLink" title="Apache Jackrabbit Oak">
+          <i class="none"></i>
+        Apache Jackrabbit Oak</a>
+            </li>
+                  
+      <li>
+    
+                          <a href="http://jackrabbit.apache.org/" class="externalLink" title="Apache Jackrabbit">
+          <i class="none"></i>
+        Apache Jackrabbit</a>
+            </li>
+            </ul>
+                
+                    
+                
+          <hr class="divider" />
+
+           <div id="poweredBy">
+                   
+    <script type="text/javascript" src="https://apis.google.com/js/plusone.js"></script>
+
+    
+    <div class="g-plusone" data-href="http://jackrabbit.apache.org/oak/docs/" data-size="tall" ></div>
+
+                   <div class="clear"></div>
+                            <div class="clear"></div>
+                            <div class="clear"></div>
+                             <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
+        <img class="builtBy" alt="Built by Maven" src="../images/logos/maven-feather.png" />
+      </a>
+                  </div>
+          </div>
+        </div>
+        
+                
+        <div id="bodyColumn"  class="span9" >
+                                  
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License. --><div class="section">
+<h2>Lucene Index<a name="Lucene_Index"></a></h2>
+<p><b>Following details are applicable for Oak release 1.0.8 and earlier. For current documentation refer to <a href="lucene.html">Current Lucene documentation</a></b></p>
+<p>Oak supports Lucene based indexes to support both property constraint and full text constraints</p>
+<div class="section">
+<h3>The Lucene Full-Text Index<a name="The_Lucene_Full-Text_Index"></a></h3>
+<p>The full-text index handles the &#x2018;contains&#x2019; type of queries:</p>
+
+<div class="source">
+<pre>//*[jcr:contains(., 'text')]
+</pre></div>
+<p>If a full-text index is configured, then all queries that have a full-text condition use the full-text index, no matter if there are other conditions that are indexed, and no matter if there is a path restriction.</p>
+<p>If no full-text index is configured, then queries with full-text conditions may not work as expected. (The query engine has a basic verification in place for full-text conditions, but it does not support all features that Lucene does, and it traverses all nodes if there are no indexed constraints).</p>
+<p>The full-text index update is asynchronous via a background thread, see <tt>Oak#withAsyncIndexing</tt>. This means that some full-text searches will not work for a small window of time: the background thread runs every 5 seconds, plus the time is takes to run the diff and to run the text-extraction process. </p>
+<p>The async update status is now reflected on the <tt>oak:index</tt> node with the help of a few properties, see <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-980">OAK-980</a></p>
+<p>TODO Node aggregation <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-828">OAK-828</a></p>
+<p>The index definition node for a lucene-based full-text index:</p>
+
+<ul>
+  
+<li>must be of type <tt>oak:QueryIndexDefinition</tt></li>
+  
+<li>must have the <tt>type</tt> property set to <b><tt>lucene</tt></b></li>
+  
+<li>must contain the <tt>async</tt> property set to the value <tt>async</tt>, this is what sends the index update process to a background thread</li>
+</ul>
+<p><i>Optionally</i> you can add</p>
+
+<ul>
+  
+<li>what subset of property types to be included in the index via the<br /> <tt>includePropertyTypes</tt> property</li>
+  
+<li>a blacklist of property names: what property to be excluded from the index  via the <tt>excludePropertyNames</tt> property</li>
+  
+<li>the <tt>reindex</tt> flag which when set to <tt>true</tt>, triggers a full content re-index.</li>
+</ul>
+<p>Example:</p>
+
+<div class="source">
+<pre>{
+  NodeBuilder index = root.child(&quot;oak:index&quot;);
+  index.child(&quot;lucene&quot;)
+    .setProperty(&quot;jcr:primaryType&quot;, &quot;oak:QueryIndexDefinition&quot;, Type.NAME)
+    .setProperty(&quot;type&quot;, &quot;lucene&quot;)
+    .setProperty(&quot;async&quot;, &quot;async&quot;)
+    .setProperty(PropertyStates.createProperty(&quot;includePropertyTypes&quot;, ImmutableSet.of(
+        PropertyType.TYPENAME_STRING, PropertyType.TYPENAME_BINARY), Type.STRINGS))
+    .setProperty(PropertyStates.createProperty(&quot;excludePropertyNames&quot;, ImmutableSet.of( 
+        &quot;jcr:createdBy&quot;, &quot;jcr:lastModifiedBy&quot;), Type.STRINGS))
+    .setProperty(&quot;reindex&quot;, true);
+}
+</pre></div>
+<p><b>Note</b> The Oak Lucene index will only index <i>Strings</i> and <i>Binaries</i> by default. If you need to add another data type, you need to add it to the<br /><i>includePropertyTypes</i> setting, and don&#x2019;t forget to set the <i>reindex</i> flag to true.</p></div>
+<div class="section">
+<h3>Lucene Property Index (Since 1.0.8)<a name="Lucene_Property_Index_Since_1.0.8"></a></h3>
+<p>Oak uses Lucene for creating index to support queries which involve property constraint that is not full-text</p>
+
+<div class="source">
+<pre>select * from [nt:base] where [alias] = '/admin'
+</pre></div>
+<p>To define a property index on a subtree for above query you have to add an index definition </p>
+
+<div class="source">
+<pre>&quot;uuid&quot; : {
+        &quot;jcr:primaryType&quot;: &quot;oak:QueryIndexDefinition&quot;,
+        &quot;type&quot;: &quot;lucene&quot;,
+        &quot;async&quot;: &quot;async&quot;,
+        &quot;fulltextEnabled&quot;: false,
+        &quot;includePropertyNames&quot;: [&quot;alias&quot;]
+    }
+</pre></div>
+<p>The index definition node for a lucene-based full-text index:</p>
+
+<ul>
+  
+<li>must be of type <tt>oak:QueryIndexDefinition</tt></li>
+  
+<li>must have the <tt>type</tt> property set to <b><tt>lucene</tt></b></li>
+  
+<li>must contain the <tt>async</tt> property set to the value <tt>async</tt>, this is what sends the index update process to a background thread</li>
+  
+<li>must have <tt>fulltextEnabled</tt> set to <tt>false</tt></li>
+  
+<li>must provide a whitelist of property names which should be indexed via <tt>includePropertyNames</tt></li>
+</ul>
+<p><i>Note that compared to <a href="query.html#property-index">Property Index</a> Lucene Property Index is always configured in Async mode hence it might lag behind in reflecting the current repository state while performing the query</i></p>
+<p>Taking another example. </p>
+
+<div class="source">
+<pre>select
+    *
+from
+    [app:Asset] as a
+where
+    [jcr:content/jcr:lastModified] &gt; cast('2014-10-01T00:00:00.000+02:00' as date)
+    and [jcr:content/metadata/format] = 'image'
+order by
+    jcr:content/jcr:lastModified
+</pre></div>
+<p>To enable faster execution for above query you can create following Lucene property index </p>
+
+<div class="source">
+<pre>&quot;assetIndex&quot;:
+{
+  &quot;jcr:primaryType&quot;:&quot;oak:QueryIndexDefinition&quot;,
+  &quot;declaringNodeTypes&quot;:&quot;app:Asset&quot;,
+  &quot;includePropertyNames&quot;:[&quot;jcr:content/jcr:lastModified&quot; , 
+      &quot;jcr:content/metadata/format&quot;],
+  &quot;type&quot;:&quot;lucene&quot;,
+  &quot;async&quot;:&quot;async&quot;,
+  &quot;reindex&quot;:true,
+  &quot;fulltextEnabled&quot;:false,
+  &quot;orderedProps&quot;:[&quot;jcr:content/jcr:lastModified&quot;]
+  &quot;properties&quot;:	{
+    &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
+    &quot;jcr:content&quot;: {
+      &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
+      &quot;jcr:lastModified&quot;:	{
+        &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
+        &quot;type&quot;:&quot;Date&quot;
+      }
+    }
+  }	
+}
+</pre></div>
+<p>Above index definition makes use of various features supported by property index</p>
+
+<ul>
+  
+<li><tt>declaringNodeTypes</tt> - As the query involves nodes of type <tt>app:Asset</tt> index is restricted to only index nodes of type <tt>app:Asset</tt></li>
+  
+<li><tt>orderedProps</tt> - As the query performs sorting via <tt>order by</tt> clause index is configured with property names which are used in sorting</li>
+  
+<li><tt>properties</tt> - For ordering to work properly we need to tell the type of property</li>
+</ul>
+<p>For implementation details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2005">OAK-2005</a>. Following sections would provide more details about supported features</p></div>
+<div class="section">
+<h3>Index Definition<a name="Index_Definition"></a></h3>
+<p>Lucene index definition is managed via <tt>NodeStore</tt> and supports following attributes</p>
+
+<dl>
+<dt>type</dt>
+<dd>Required and should always be <tt>lucene</tt></dd>
+<dt>async</dt>
+<dd>Required and should always be <tt>async</tt></dd>
+<dt>fulltextEnabled</dt>
+<dd>For Lucene based property index this should <i>always</i> be set to <tt>false</tt></dd>
+<dt>declaringNodeTypes</dt>
+<dd>Node type names whose properties should be indexed. If not specified then all  nodes would indexed if they have properties defined in <tt>includePropertyNames</tt>.  For smaller and efficient indexes its recommended that <tt>declaringNodeTypes</tt>  should be specified according to your query needs</dd>
+<dt>includePropertyNames</dt>
+<dd>List of property name which should be indexed. Property name can be  relative e.g. <tt>jcr:content/jcr:lastModified</tt></dd>
+<dt>orderedProps</dt>
+<dd>List of property names which would be used in the <tt>order by</tt> clause of the  query</dd>
+<dt>includePropertyTypes</dt>
+<dd>Used in Lucene Fulltext Index</dd>
+<dd>For full text index defaults to <tt>String, Binary</tt></dd>
+<dd>List of property types which should be indexed. The values can be one  specified in <a class="externalLink" href="http://www.day.com/specs/jsr170/javadocs/jcr-2.0/constant-values.html#javax.jcr.PropertyType.TYPENAME_STRING">PropertyType Names</a></dd>
+<dt><a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2201">blobSize</a></dt>
+<dd>Default value 32768 (32kb)</dd>
+<dd>Size in bytes used for splitting the index files when storing them in NodeStore</dd>
+<dt>functionName</dt>
+<dd>Name to be used to enable index usage with <a href="#native-query">native query support</a></dd>
+</dl></div>
+<div class="section">
+<h3>Property Definition<a name="Property_Definition"></a></h3>
+<p>In some cases property specific configurations are required. For example typically while performing order by in query user does not specify the property type. In such cases you need to specify the property type explicitly.</p>
+<p>Property definition nodes are created as per there property name under <tt>properties</tt> node of index definition node. For relative properties you would need to create the required path structure under <tt>properties</tt> node. For e.g. for property <tt>jcr:content/metadata/format</tt> you need to create property node at path <tt>&lt;index definition node&gt;/properties/jcr:content/jcr:lastModified</tt></p>
+
+<div class="source">
+<pre>&quot;properties&quot;:
+  {
+    &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
+    &quot;jcr:content&quot;:
+    {
+      &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
+      &quot;jcr:lastModified&quot;:
+      {
+        &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
+        &quot;type&quot;:&quot;Date&quot;
+      }
+    }
+  }	
+</pre></div>
+
+<dl>
+<dt>type</dt>
+<dd>JCR Property type. Can be one of <tt>Date</tt>, <tt>Boolean</tt>, <tt>Double</tt> or <tt>Long</tt></dd>
+<dt>boost</dt>
+<dd>The boost value. Defaults to 1.0</dd>
+<dd>Since 1.0.9</dd>
+</dl></div>
+<div class="section">
+<h3>Ordering<a name="Ordering"></a></h3>
+<p>Lucene property index provides efficient sorting support based on Lucene DocValue fields. To configure specify the list of property names which can be used in the <tt>order by</tt> clause as part of <tt>orderedProps</tt> property.</p>
+<p>If the property is of type other than string then you must specify the property definition with <tt>type</tt> details</p>
+<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2196">Lucene based Sorting</a> for more details. </p>
+<p><a name="osgi-config"></a></p></div>
+<div class="section">
+<h3>LuceneIndexProvider Configuration<a name="LuceneIndexProvider_Configuration"></a></h3>
+<p>Some of the runtime aspects of the Oak Lucene support can be configured via OSGi configuration. The configuration needs to be done for PID <tt>org.apache
+.jackrabbit.oak.plugins.index.lucene.LuceneIndexProviderService</tt></p>
+<p><img src="lucene-osgi-config.png" alt="OSGi Configuration" /></p>
+
+<dl>
+<dt>enableCopyOnReadSupport</dt>
+<dd>Enable copying of Lucene index to local file system to improve query performance. See <a href="#copy-on-read">Copy Indexes On Read</a></dd>
+<dt>localIndexDir</dt>
+<dd>Directory to be used for when copy index files to local file system. To be specified when <tt>enableCopyOnReadSupport</tt> is enabled</dd>
+<dt>debug</dt>
+<dd>Boolean value. Defaults to <tt>false</tt></dd>
+<dd>If enabled then Lucene logging would be integrated with Slf4j</dd>
+</dl>
+<p><a name="non-root-index"></a></p></div>
+<div class="section">
+<h3>Non Root Index Definitions<a name="Non_Root_Index_Definitions"></a></h3>
+<p>Lucene index definition can be defined at any location in repository and need not always be defined at root. For example if your query involves path restrictions like</p>
+
+<div class="source">
+<pre>select * from [app:Asset] as a where ISDESCENDANTNODE(a, '/content/companya') and [format] = 'image'
+</pre></div>
+<p>Then you can create the required index definition say <tt>assetIndex</tt> at <tt>/content/companya/oak:index/assetIndex</tt>. In such a case that index would contain data for the subtree under <tt>/content/companya</tt></p>
+<p><a name="native-query"></a></p></div>
+<div class="section">
+<h3>Native Query and Index Selection<a name="Native_Query_and_Index_Selection"></a></h3>
+<p>Oak query engine supports native queries like</p>
+
+<div class="source">
+<pre>//*[rep:native('lucene', 'name:(Hello OR World)')]
+</pre></div>
+<p>If multiple Lucene based indexes are enabled on the system and you need to make use of specific Lucene index like <tt>/oak:index/assetIndex</tt> then you can specify the index name via <tt>functionName</tt> attribute on index definition. </p>
+<p>For example for assetIndex definition like </p>
+
+<div class="source">
+<pre>{
+  &quot;jcr:primaryType&quot;:&quot;oak:QueryIndexDefinition&quot;,
+  &quot;type&quot;:&quot;lucene&quot;,
+  ...
+  &quot;functionName&quot; : &quot;lucene-assetIndex&quot;,
+}
+</pre></div>
+<p>Executing following query would ensure that Lucene index from <tt>assetIndex</tt> should be used</p>
+
+<div class="source">
+<pre>//*[rep:native('lucene-assetIndex', 'name:(Hello OR World)')]
+</pre></div></div>
+<div class="section">
+<h3>Persisting indexes to FileSystem<a name="Persisting_indexes_to_FileSystem"></a></h3>
+<p>By default Lucene indexes are stored in the <tt>NodeStore</tt>. If required they can be stored on the file system directly</p>
+
+<div class="source">
+<pre>{
+  &quot;jcr:primaryType&quot;:&quot;oak:QueryIndexDefinition&quot;,
+  &quot;type&quot;:&quot;lucene&quot;,
+  ...
+  &quot;persistence&quot; : &quot;file&quot;,
+  &quot;path&quot; : &quot;/path/to/store/index&quot;
+}
+</pre></div>
+<p>To store the Lucene index in the file system, in the Lucene index definition node, set the property <tt>persistence</tt> to <tt>file</tt>, and set the property <tt>path</tt> to the directory where the index should be stored. Then start reindexing by setting <tt>reindex</tt> to <tt>true</tt>.</p>
+<p>Note that this setup would only for those non cluster <tt>NodeStore</tt>. If the backend <tt>NodeStore</tt> supports clustering then index data would not be accessible on other cluster nodes</p>
+<p><a name="copy-on-read"></a></p></div>
+<div class="section">
+<h3>CopyOnRead<a name="CopyOnRead"></a></h3>
+<p>Lucene indexes are stored in <tt>NodeStore</tt>. Oak Lucene provides a custom directory implementation which enables Lucene to load index from <tt>NodeStore</tt>. This might cause performance degradation if the <tt>NodeStore</tt> storage is remote. For such case Oak Lucene provide a <tt>CopyOnReadDirectory</tt> which copies the index content to a local directory and enables Lucene to make use of local directory based indexes while performing queries.</p>
+<p>At runtime various details related to copy on read features are exposed via <tt>CopyOnReadStats</tt> MBean. Indexes at JCR path e.g. <tt>/oak:index/assetIndex</tt> would be copied to <tt>&lt;index dir&gt;/&lt;hash of jcr path&gt;</tt>. To determine mapping between local index directory and JCR path refer to the MBean details</p>
+<p><img src="lucene-copy-on-read.png" alt="CopyOnReadStats" /></p>
+<p>For more details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-1724">OAK-1724</a>. This feature can be enabled via <a href="#osgi-config">Lucene Index provider service configuration</a></p></div>
+<div class="section">
+<h3>Lucene Index MBeans<a name="Lucene_Index_MBeans"></a></h3>
+<p>Oak Lucene registers a JMX bean <tt>LuceneIndex</tt> which provide details about the index content e.g. size of index, number of documents present in index etc</p>
+<p><img src="lucene-index-mbean.png" alt="Lucene Index MBean" /></p>
+<p><a name="luke"></a></p></div>
+<div class="section">
+<h3>Analyzing created Lucene Index<a name="Analyzing_created_Lucene_Index"></a></h3>
+<p><a class="externalLink" href="https://code.google.com/p/luke/">Luke</a> is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display index details. In Oak Lucene index files are stored in <tt>NodeStore</tt> and hence not directly accessible. To enable analyzing the index files via Luke follow below mentioned steps</p>
+
+<ol style="list-style-type: decimal">
+  
+<li>
+<p>Download the Luke version which includes the matching Lucene jars used by  Oak. As of Oak 1.0.8 release the Lucene version used is 4.7.1. So download the jar from <a class="externalLink" href="https://github.com/DmitryKey/luke/releases">here</a></p>
+  
+<div class="source">
+<pre>$wget https://github.com/DmitryKey/luke/releases/download/4.7.0/luke-with-deps.jar
+</pre></div></li>
+  
+<li>
+<p>Use the <a class="externalLink" href="https://github.com/apache/jackrabbit-oak/tree/trunk/oak-run#console">Oak Console</a> to dump the Lucene index from <tt>NodeStore</tt>  to filesystem directory. Use the <tt>lc dump</tt> command</p>
+  
+<div class="source">
+<pre>$ java -jar oak-run-*.jar console /path/to/oak/repository
+Apache Jackrabbit Oak 1.1-SNAPSHOT
+Jackrabbit Oak Shell (Apache Jackrabbit Oak 1.1-SNAPSHOT, JVM: 1.7.0_55)
+Type ':help' or ':h' for help.
+-------------------------------------------------------------------------
+/&gt; lc info /oak:index/lucene
+Index size : 74.1 MB
+Number of documents : 235708
+Number of deleted documents : 231
+/&gt; lc 
+dump   info   
+/&gt; lc dump /path/to/dump/index/lucene /oak:index/lucene
+Copying Lucene indexes to [/path/to/dump/index/lucene]
+Copied 74.1 MB in 1.209 s
+/&gt; lc dump /path/to/dump/index/slingAlias /oak:index/slingAlias
+Copying Lucene indexes to [/path/to/dump/index/lucene-index/slingAlias]
+Copied 8.5 MB in 218.7 ms
+/&gt;
+</pre></div></li>
+  
+<li>
+<p>Post dump open the index via Luke. Oak Lucene uses a <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-1737">custom  Codec</a>. So oak-lucene jar needs to be included in Luke classpath  for it to display the index details</p>
+  
+<div class="source">
+<pre>$ java -XX:MaxPermSize=512m luke-with-deps.jar:oak-lucene-1.0.8.jar org.getoptuke.Luke
+</pre></div></li>
+</ol>
+<p>From the Luke UI shown you can access various details.</p></div>
+<div class="section">
+<h3>Index performance<a name="Index_performance"></a></h3>
+<p>Following are some best practices to get good performance from Lucene based indexes</p>
+
+<ol style="list-style-type: decimal">
+  
+<li>
+<p>Make use on <a href="#non-root-index">non root indexes</a>. If you query always  perform search under certain paths then create index definition under those  paths only. This might be helpful in multi tenant deployment where each tenant  data is stored under specific repository path and all queries are made under  those path.</p></li>
+  
+<li>
+<p>Index only required data. Depending on your requirement you can create  multiple Lucene indexes. For example if in majority of cases you are  querying on various properties specified under <tt>&lt;node&gt;/jcr:content/metadata</tt>  where node belong to certain specific nodeType then create single index  definition listing all such properties and restrict it that nodeType. You  can the size of index via mbean</p></li>
+</ol></div></div>
+                  </div>
+            </div>
+          </div>
+
+    <hr/>
+
+    <footer>
+            <div class="container-fluid">
+              <div class="row span12">Copyright &copy;                    2012-2014
+                        <a href="http://www.apache.org/">The Apache Software Foundation</a>.
+            All Rights Reserved.      
+                    
+      </div>
+
+        
+        
+          
+    
+    
+                
+    <div id="ohloh" class="pull-right">
+      <script type="text/javascript" src="http://www.ohloh.net/p/jackrabbit-oak/widgets/project_thin_badge.js"></script>
+    </div>
+        </div>
+    </footer>
+  </body>
+</html>
\ No newline at end of file

Propchange: jackrabbit/site/live/oak/docs/query/lucene-old.html
------------------------------------------------------------------------------
    svn:eol-style = native

Modified: jackrabbit/site/live/oak/docs/query/lucene.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/lucene.html?rev=1645577&r1=1645576&r2=1645577&view=diff
==============================================================================
--- jackrabbit/site/live/oak/docs/query/lucene.html (original)
+++ jackrabbit/site/live/oak/docs/query/lucene.html Mon Dec 15 06:30:05 2014
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2014-12-08
+ | Generated by Apache Maven Doxia at 2014-12-15
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20141208" />
+    <meta name="Date-Revision-yyyymmdd" content="20141215" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak - Lucene Index</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
@@ -192,7 +192,7 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2014-12-08</li>
+                  <li id="publishDate">Last Published: 2014-12-15</li>
                   <li class="divider">|</li> <li id="projectVersion">Version: 1.1-SNAPSHOT</li>
                       
                 
@@ -438,20 +438,30 @@
    See the License for the specific language governing permissions and
    limitations under the License. --><div class="section">
 <h2>Lucene Index<a name="Lucene_Index"></a></h2>
-<p>Oak supports Lucene based indexes to support both property constraint and full text constraints</p>
-<div class="section">
-<h3>The Lucene Full-Text Index<a name="The_Lucene_Full-Text_Index"></a></h3>
-<p>The full-text index handles the &#x2018;contains&#x2019; type of queries:</p>
+<p><b>Following details are applicable for Oak release 1.0.9 onwards. For pre 1.0 .9 release refer to <a href="lucene-old.html">Pre 1.0.9 Lucene documentation</a></b></p>
+<p>Oak supports Lucene based indexes to support both property constraint and full text constraints. Depending on the configuration a Lucene index can be used to evaluate property constraints, full text constraints, path restrictions and sorting.</p>
 
 <div class="source">
-<pre>//*[jcr:contains(., 'text')]
+<pre>SELECT * FROM [nt:base] WHERE [assetType] = 'image'
+</pre></div>
+<p>Following index definition would allow using Lucene index for above query</p>
+
+<div class="source">
+<pre>/oak:index/assetType
+  - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
+  - compatVersion = 2
+  - type = &quot;lucene&quot;
+  - async = &quot;async&quot;
+  + indexRules
+    - jcr:primaryType = &quot;nt:unstructured&quot;
+    + nt:base
+      + properties
+        - jcr:primaryType = &quot;nt:unstructured&quot;
+        + assetType
+          - propertyIndex = true
+          - name = &quot;assetType&quot;
 </pre></div>
-<p>If a full-text index is configured, then all queries that have a full-text condition use the full-text index, no matter if there are other conditions that are indexed, and no matter if there is a path restriction.</p>
-<p>If no full-text index is configured, then queries with full-text conditions may not work as expected. (The query engine has a basic verification in place for full-text conditions, but it does not support all features that Lucene does, and it traverses all nodes if there are no indexed constraints).</p>
-<p>The full-text index update is asynchronous via a background thread, see <tt>Oak#withAsyncIndexing</tt>. This means that some full-text searches will not work for a small window of time: the background thread runs every 5 seconds, plus the time is takes to run the diff and to run the text-extraction process. </p>
-<p>The async update status is now reflected on the <tt>oak:index</tt> node with the help of a few properties, see <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-980">OAK-980</a></p>
-<p>TODO Node aggregation <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-828">OAK-828</a></p>
-<p>The index definition node for a lucene-based full-text index:</p>
+<p>The index definition node for a lucene-based index</p>
 
 <ul>
   
@@ -459,179 +469,382 @@
   
 <li>must have the <tt>type</tt> property set to <b><tt>lucene</tt></b></li>
   
-<li>must contain the <tt>async</tt> property set to the value <tt>async</tt>, this is what sends the index update process to a background thread</li>
+<li>must contain the <tt>async</tt> property set to the value <tt>async</tt>, this is what  sends the index update process to a background thread</li>
 </ul>
-<p><i>Optionally</i> you can add</p>
+<p><i>Note that compared to <a href="query.html#property-index">Property Index</a> Lucene Property Index is always configured in Async mode hence it might lag behind in reflecting the current repository state while performing the query</i></p>
+<p>Taking another example. To support following query</p>
 
-<ul>
-  
-<li>what subset of property types to be included in the index via the<br /> <tt>includePropertyTypes</tt> property</li>
-  
-<li>a blacklist of property names: what property to be excluded from the index  via the <tt>excludePropertyNames</tt> property</li>
-  
-<li>the <tt>reindex</tt> flag which when set to <tt>true</tt>, triggers a full content re-index.</li>
-</ul>
-<p>Example:</p>
+<div class="source">
+<pre>//*[jcr:contains(., 'text')]
+</pre></div>
+<p>The Lucene index needs to be configured to index all properties</p>
 
 <div class="source">
-<pre>{
-  NodeBuilder index = root.child(&quot;oak:index&quot;);
-  index.child(&quot;lucene&quot;)
-    .setProperty(&quot;jcr:primaryType&quot;, &quot;oak:QueryIndexDefinition&quot;, Type.NAME)
-    .setProperty(&quot;type&quot;, &quot;lucene&quot;)
-    .setProperty(&quot;async&quot;, &quot;async&quot;)
-    .setProperty(PropertyStates.createProperty(&quot;includePropertyTypes&quot;, ImmutableSet.of(
-        PropertyType.TYPENAME_STRING, PropertyType.TYPENAME_BINARY), Type.STRINGS))
-    .setProperty(PropertyStates.createProperty(&quot;excludePropertyNames&quot;, ImmutableSet.of( 
-        &quot;jcr:createdBy&quot;, &quot;jcr:lastModifiedBy&quot;), Type.STRINGS))
-    .setProperty(&quot;reindex&quot;, true);
-}
+<pre>/oak:index/assetType
+  - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
+  - compatVersion = 2
+  - type = &quot;lucene&quot;
+  - async = &quot;async&quot;
+  + indexRules
+    - jcr:primaryType = &quot;nt:unstructured&quot;
+    + nt:base
+      + properties
+        - jcr:primaryType = &quot;nt:unstructured&quot;
+        + allProps
+          - name = &quot;.*&quot;
+          - isRegexp = true
+          - nodeScopeIndex = true
 </pre></div>
-<p><b>Note</b> The Oak Lucene index will only index <i>Strings</i> and <i>Binaries</i> by default. If you need to add another data type, you need to add it to the<br /><i>includePropertyTypes</i> setting, and don&#x2019;t forget to set the <i>reindex</i> flag to true.</p></div>
 <div class="section">
-<h3>Lucene Property Index (Since 1.0.8)<a name="Lucene_Property_Index_Since_1.0.8"></a></h3>
-<p>Oak uses Lucene for creating index to support queries which involve property constraint that is not full-text</p>
+<h3>Index Definition<a name="Index_Definition"></a></h3>
+<p>Lucene index definition consist of <tt>indexingRules</tt>, <tt>analyzers</tt> , <tt>aggregates</tt> etc which determine which node and properties are to be indexed and how they are indexed.</p>
+<p>Below is the canonical index definition structure</p>
+
+<div class="source">
+<pre>luceneIndex (oak:QueryIndexDefinition)
+  - type (string) = 'lucene' mandatory
+  - async (string) = 'async' mandatory
+  - blobSize (long) = 32768
+  - evaluatePathRestrictions (boolean) = false
+  - name (string)
+  - compatMode (long) = 2
+  + indexRules (nt:unstructured)
+  + aggregates (nt:unstructured)
+  + analyzers (nt:unstructured)
+</pre></div>
+<p>Following are the config options which can be defined at the index definition level</p>
+
+<dl>
+<dt>type</dt>
+<dd>Required and should always be <tt>lucene</tt></dd>
+<dt>async</dt>
+<dd>Required and should always be <tt>async</tt></dd>
+<dt><a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2201">blobSize</a></dt>
+<dd>Default value 32768 (32kb)</dd>
+<dd>Size in bytes used for splitting the index files when storing them in NodeStore</dd>
+<dt>functionName</dt>
+<dd>Name to be used to enable index usage with <a href="#native-query">native query support</a></dd>
+<dt>evaluatePathRestrictions</dt>
+<dd>Optional boolean property defaults to <tt>false</tt></dd>
+<dd>If enabled the index can evaluate <a href="#path-restrictions">path restrictions</a></dd>
+<dt>name</dt>
+<dd>Optional property</dd>
+<dd>Captures the name of the index which is used while logging</dd>
+<dt>compatMode</dt>
+<dd>Required integer property and should be set to 2</dd>
+<dd>By default Oak uses older Lucene index implementation which does not  supports property restrictions, index time aggregation etc. To make use of  this feature set it to 2</dd>
+</dl>
+<div class="section">
+<h4>Indexing Rules<a name="Indexing_Rules"></a></h4>
+<p>Indexing rules defines which types of node and properties are indexed. An index configuration can define one or more <tt>indexingRules</tt> for different nodeTypes.</p>
 
 <div class="source">
-<pre>select * from [nt:base] where [alias] = '/admin'
+<pre>fulltextIndex
+  - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
+  - compatVersion = 2
+  - type = &quot;lucene&quot;
+  - async = &quot;async&quot;
+  + indexRules
+    - jcr:primaryType = &quot;nt:unstructured&quot;
+    + app:Page
+      + properties
+        - jcr:primaryType = &quot;nt:unstructured&quot;
+        + publishedDate
+          - propertyIndex = true
+          - name = &quot;jcr:content/publishedDate&quot;
+    + app:Asset
+      + properties
+        - jcr:primaryType = &quot;nt:unstructured&quot;
+        + imageType
+          - propertyIndex = true
+          - name = &quot;jcr:content/metadata/imageType&quot;
+</pre></div>
+<p>Rules are defined per nodeType and each rule has one or more property definitions determine which properties are indexed. Below is the canonical index definition structure</p>
+
+<div class="source">
+<pre>ruleName (nt:unstructured)
+  - inherited (boolean) = true
+  - includePropertyTypes (string) multiple
+  + properties (nt:unstructured)
 </pre></div>
-<p>To define a property index on a subtree for above query you have to add an index definition </p>
+<p>Following are the config options which can be defined at the index rule level</p>
+
+<dl>
+<dt>inherited</dt>
+<dd>Optional boolean property defaults to true</dd>
+<dd>Determines if the rule is applicable on exact match or can be applied if  match is done on basis of nodeType inheritance</dd>
+<dt>includePropertyTypes</dt>
+<dd>Applicable when index is enabled for fulltext indexing</dd>
+<dd>For full text index defaults to include all types</dd>
+<dd>String array of property types which should be indexed. The values can be one  specified in <a class="externalLink" href="http://www.day.com/specs/jsr170/javadocs/jcr-2.0/constant-values.html#javax.jcr.PropertyType.TYPENAME_STRING">PropertyType Names</a></dd>
+</dl>
+<div class="section">
+<h5>Indexing Rule inheritance<a name="Indexing_Rule_inheritance"></a></h5>
+<p><tt>indexRules</tt> are defined per nodeType and support nodeType inheritance. For example while indexing any node the indexer would lookup for applicable indexRule for that node based on its <i>primaryType</i>. If a direct match is found then that rule would be used otherwise it would look for rule for any of the parent types. The rules are looked up in the order of there entry under <tt>indexRules</tt> node (indexRule node itself is of type <tt>nt:unstructured</tt> which has <tt>orderable</tt> child nodes)</p>
+<p>If <tt>inherited</tt> is set to false on any rule then that rule would only be applicable if exact match is found</p></div>
+<div class="section">
+<h5>Property Definitions<a name="Property_Definitions"></a></h5>
+<p>Each index rule consist of one ore more property definition defined under <tt>properties</tt>. Order of property definition node is important as some properties are based on regular expressions. Below is the canonical property definition structure</p>
 
 <div class="source">
-<pre>&quot;uuid&quot; : {
-        &quot;jcr:primaryType&quot;: &quot;oak:QueryIndexDefinition&quot;,
-        &quot;type&quot;: &quot;lucene&quot;,
-        &quot;async&quot;: &quot;async&quot;,
-        &quot;fulltextEnabled&quot;: false,
-        &quot;includePropertyNames&quot;: [&quot;alias&quot;]
-    }
+<pre>propNode (nt:unstructured)
+  - name (string)
+  - boost (double) = '1.0'
+  - index (boolean) = true
+  - useInExcerpt (boolean) = false
+  - analyzed (boolean) = false
+  - nodeScopeIndex (boolean) = false
+  - ordered (boolean) = false
+  - isRegexp (boolean) = false
+  - type (string) = 'undefined'
 </pre></div>
-<p>The index definition node for a lucene-based full-text index:</p>
+<p>Following are the details about the above mentioned config options which can be defined at the property definition level</p>
 
+<dl>
+<dt>name</dt>
+<dd>Property name. If not defined then property name is set to the node name.  If <tt>isRegexp</tt> is true then it defines the regular expression. Can also be set  to a relative property.</dd>
+<dt>isRegexp</dt>
+<dd>If set to true then property name would be interpreted as a regular  expression and the given definition would be applicable for matching property  names. Note that expression should be structured such that it does not  match &#x2018;/&#x2019;.
+  
 <ul>
+    
+<li><tt>.*</tt> - This property definition is applicable for all properties of given  node</li>
+    
+<li><tt>jcr:content/metadata/.*</tt> - This property definition is  applicable for all properties of child node <i>jcr:content/metadata</i></li>
+  </ul></dd>
+<dt>boost</dt>
+<dd>If the property is included in <tt>nodeScopeIndex</tt> then it defines the boost  done for the index value against the given property name.</dd>
+<dt>index</dt>
+<dd>Determines if this property should be indexed. Mostly useful for fulltext  index where some properties need to be <i>excluded</i> from getting indexed.</dd>
+<dt>useInExcerpt</dt>
+<dd>Controls whether the value of a property should be used to create an excerpt.  The value of the property is still full-text indexed when set to false, but it  will never show up in an excerpt for its parent node. If set to true then  property value would be stored separately within index causing the index  size to increase. So set it to true only if you make use of excerpt feature</dd>
+<dt>nodeScopeIndex</dt>
+<dd>Control whether the value of a property should be part of fulltext index. That  is, you can do a <i>jcr:contains(., &#x2018;foo&#x2019;)</i> and it will return nodes that have a  string property that contains the word foo. Example
   
-<li>must be of type <tt>oak:QueryIndexDefinition</tt></li>
+<ul>
+    
+<li><i>//element(*, app:Asset)[jcr:contains(., &#x2018;image&#x2019;)]</i></li>
+  </ul></dd>
+<dt>analyzed</dt>
+<dd>Set this to true if the property is used as part of <tt>contains</tt>. Example
   
-<li>must have the <tt>type</tt> property set to <b><tt>lucene</tt></b></li>
+<ul>
+    
+<li><i>//element(*, app:Asset)[jcr:contains(type, &#x2018;image&#x2019;)]</i></li>
+    
+<li><i>//element(*, app:Asset)[jcr:contains(jcr:content/metadata/@format, &#x2018;image&#x2019;)]</i></li>
+  </ul></dd>
+<dt>ordered</dt>
+<dd>If the property is to be used in <i>order by</i> clause to perform sorting then  this should be set to true. This should be set to true only if the property  is to be used to perform sorting as it increases the index size. Example
+  
+<ul>
+    
+<li><i>//element(*, app:Asset)[jcr:contains(type, &#x2018;image&#x2019;)] order by @size</i></li>
+    
+<li><i>//element(*, app:Asset)[jcr:contains(type, &#x2018;image&#x2019;)] order by jcr:content/@jcr:lastModified</i></li>
+  </ul></dd>
+</dl>
+<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2196">Lucene based Sorting</a> for more details</p>
+
+<dl>
+<dt>type</dt>
+<dd>JCR Property type. Can be one of <tt>Date</tt>, <tt>Boolean</tt>, <tt>Double</tt> or <tt>Long</tt>. Mostly  inferred from the indexed value. However in some cases where same property  type is not used consistently across various nodes then it would recommened  to specify the type explicitly.</dd>
+</dl>
+<p><b>Property Names</b></p>
+<p>Property name can be one of following</p>
+
+<ol style="list-style-type: decimal">
   
-<li>must contain the <tt>async</tt> property set to the value <tt>async</tt>, this is what sends the index update process to a background thread</li>
+<li>Simple name - Like <i>assetType</i> etc. These are used for properties which are  defined directly on the indexed node</li>
   
-<li>must have <tt>fulltextEnabled</tt> set to <tt>false</tt></li>
+<li>Relative name - Like <i>jcr:content/metadata/title</i>. These are used for  properties which are defined relative to the node being indexed.</li>
   
-<li>must provide a whitelist of property names which should be indexed via <tt>includePropertyNames</tt></li>
-</ul>
-<p><i>Note that compared to <a href="query.html#property-index">Property Index</a> Lucene Property Index is always configured in Async mode hence it might lag behind in reflecting the current repository state while performing the query</i></p>
-<p>Taking another example. </p>
+<li>Regular Expression - Like <i>.*</i>. Used when only property whose name  match given pattern are to be indexed.  They can also be used for relative properties like  <i>jcr:content/metadata/dc:.*$</i>  which indexes all property names starting with <i>dc</i> from node with  relative path <i>jcr:content/metadata</i></li>
+</ol>
+<p><a name="path-restrictions"></a></p></div>
+<div class="section">
+<h5>Evaluate Path Restrictions<a name="Evaluate_Path_Restrictions"></a></h5>
+<p>Lucene index provides support for evaluating path restrictions natively. Consider a query like</p>
 
 <div class="source">
-<pre>select
-    *
-from
-    [app:Asset] as a
-where
-    [jcr:content/jcr:lastModified] &gt; cast('2014-10-01T00:00:00.000+02:00' as date)
-    and [jcr:content/metadata/format] = 'image'
-order by
-    jcr:content/jcr:lastModified
-</pre></div>
-<p>To enable faster execution for above query you can create following Lucene property index </p>
-
-<div class="source">
-<pre>&quot;assetIndex&quot;:
-{
-  &quot;jcr:primaryType&quot;:&quot;oak:QueryIndexDefinition&quot;,
-  &quot;declaringNodeTypes&quot;:&quot;app:Asset&quot;,
-  &quot;includePropertyNames&quot;:[&quot;jcr:content/jcr:lastModified&quot; , 
-      &quot;jcr:content/metadata/format&quot;],
-  &quot;type&quot;:&quot;lucene&quot;,
-  &quot;async&quot;:&quot;async&quot;,
-  &quot;reindex&quot;:true,
-  &quot;fulltextEnabled&quot;:false,
-  &quot;orderedProps&quot;:[&quot;jcr:content/jcr:lastModified&quot;]
-  &quot;properties&quot;:	{
-    &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
-    &quot;jcr:content&quot;: {
-      &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
-      &quot;jcr:lastModified&quot;:	{
-        &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
-        &quot;type&quot;:&quot;Date&quot;
-      }
-    }
-  }	
-}
+<pre>select * from [app:Asset] as a where isdescendantnode(a, [/content/app/old]) AND contains(*, 'white')
 </pre></div>
-<p>Above index definition makes use of various features supported by property index</p>
+<p>By default the index would return all node which <i>contain white</i> and Query engine would filter out nodes which are not under <i>/content/app/old</i>. This can perform slow if lots of nodes are not under that path. To speed up such queries one can enable <tt>evaluatePathRestrictions</tt> in Lucene index and index would only return nodes which are under <i>/content/app/old</i>.</p>
+<p>Enabling this feature would incur cost in terms of slight increase in index size. Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2306">OAK-2306</a> for more details.</p></div></div>
+<div class="section">
+<h4>Aggregation<a name="Aggregation"></a></h4>
+<p>Sometimes it is useful to include the contents of descendant nodes into a single node to easier search on content that is scattered across multiple nodes.</p>
+<p>Oak allows you to define index aggregates based on relative path patterns and primary node types. Changes to aggregated items cause the main item to be reindexed, even if it was not modified.</p>
+<p>Aggregation configuration is defined under the <tt>aggregates</tt> node under index configuration. The following example creates an index aggregate on nt:file that includes the content of the jcr:content node:</p>
+
+<div class="source">
+<pre>fulltextIndex
+  - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
+  - compatVersion = 2
+  - type = &quot;lucene&quot;
+  - async = &quot;async&quot;
+  + aggregates
+    + nt:file
+      + include0
+        - path = &quot;jcr:content&quot;
+</pre></div>
+<p>For a given nodeType multiple includes can be defined. Below is the aggregate definition structure for any specific include rule</p>
+
+<div class="source">
+<pre>aggregateNodeInclude (nt:unstructured)
+  - path (string) mandatory
+  - primaryType (string)
+  - relativeNode (boolean) = false
+</pre></div>
+<p>Following are the details about the above mentioned config options which can be defined as part of aggregation include. (Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2268">OAK-2268</a> for implementation details)</p>
 
-<ul>
-  
-<li><tt>declaringNodeTypes</tt> - As the query involves nodes of type <tt>app:Asset</tt> index is restricted to only index nodes of type <tt>app:Asset</tt></li>
-  
-<li><tt>orderedProps</tt> - As the query performs sorting via <tt>order by</tt> clause index is configured with property names which are used in sorting</li>
+<dl>
+<dt>path</dt>
+<dd>Path pattern to include. Example
   
-<li><tt>properties</tt> - For ordering to work properly we need to tell the type of property</li>
-</ul>
-<p>For implementation details refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2005">OAK-2005</a>. Following sections would provide more details about supported features</p></div>
+<ul>
+    
+<li><tt>jcr:content</tt> - Name explicitly specified</li>
+    
+<li><tt>*</tt> - Any child node at depth 1</li>
+    
+<li><tt>*/*</tt> - Any child node at depth 2</li>
+  </ul></dd>
+<dt>primaryType</dt>
+<dd>
+<p>Restrict the included nodes to a certain type. The restriction would be  applied on the last node in given path</p>
+  
+<div class="source">
+<pre>+ aggregates
+  + nt:file
+    + include0
+      - path = &quot;jcr:content&quot;
+      - primaryType = &quot;nt:resource&quot;
+</pre></div></dd>
+<dt>relativeNode</dt>
+<dd>
+<p>Boolean property indicates that query can be performed against specific node  For example for following content</p>
+  
+<div class="source">
+<pre>+ space.txt (app:Asset)
+  + renditions (nt:folder)
+    + original (nt:file)
+      + jcr:content (nt:resource)
+        - jcr:data
+</pre></div></dd>
+</dl>
+<p>And a query like</p>
+
+<div class="source">
+<pre>    select * from [app:Asset] where contains(renditions/original/*, &quot;pluto&quot;)
+</pre></div>
+<p>Following index configuration would be required</p>
+
+<div class="source">
+<pre>    fulltextIndex
+      - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
+      - compatVersion = 2
+      - type = &quot;lucene&quot;
+      - async = &quot;async&quot;
+      + aggregates
+        + nt:file
+          + include0
+            - path = &quot;jcr:content&quot;
+        + app:Asset
+          + include0
+            - path = &quot;renditions/original&quot;
+            - relativeNode = true
+      + indexRules
+        - jcr:primaryType = &quot;nt:unstructured&quot;
+        + app:Asset
+</pre></div>
+<p><b>Aggregation and Recursion</b></p>
+<p>While performing aggregation the aggregation rules are again applied on node being aggregated. For example while aggregating for <i>app:Asset</i> above when <i>renditions/original/*</i> is being aggregated then aggregation rule would again be applied. In this case as <i>renditions/original</i> is <i>nt:file</i> then aggregation rule applicable for <i>nt:file</i> would be applied. Such a logic might result in recursion. (See <a class="externalLink" href="https://issues.apache.org/jira/browse/JCR-2989?focusedCommentId=13051101">JCR-2989</a> for details).</p>
+<p>For such case <tt>reaggregateLimit</tt> is set on aggregate definition node and defaults to 5</p>
+
+<div class="source">
+<pre>  + aggregates
+    + app:Asset
+      - reaggregateLimit (long) = 5
+      + include0
+        - path = &quot;renditions/original&quot;
+        - relativeNode = true
+</pre></div></div>
 <div class="section">
-<h3>Index Definition<a name="Index_Definition"></a></h3>
-<p>Lucene index definition is managed via <tt>NodeStore</tt> and supports following attributes</p>
+<h4>Analyzers<a name="Analyzers"></a></h4>
+<p>Analyzers can be configured as part of index definition via <tt>analyzers</tt> node. The default analyzer can be configured via <tt>analyzers/default</tt> node</p>
 
-<dl>
-<dt>type</dt>
-<dd>Required and should always be <tt>lucene</tt></dd>
-<dt>async</dt>
-<dd>Required and should always be <tt>async</tt></dd>
-<dt>fulltextEnabled</dt>
-<dd>For Lucene based property index this should <i>always</i> be set to <tt>false</tt></dd>
-<dt>declaringNodeTypes</dt>
-<dd>Node type names whose properties should be indexed. If not specified then all  nodes would indexed if they have properties defined in <tt>includePropertyNames</tt>.  For smaller and efficient indexes its recommended that <tt>declaringNodeTypes</tt>  should be specified according to your query needs</dd>
-<dt>includePropertyNames</dt>
-<dd>List of property name which should be indexed. Property name can be  relative e.g. <tt>jcr:content/jcr:lastModified</tt></dd>
-<dt>orderedProps</dt>
-<dd>List of property names which would be used in the <tt>order by</tt> clause of the  query</dd>
-<dt>includePropertyTypes</dt>
-<dd>Used in Lucene Fulltext Index</dd>
-<dd>For full text index defaults to <tt>String, Binary</tt></dd>
-<dd>List of property types which should be indexed. The values can be one  specified in <a class="externalLink" href="http://www.day.com/specs/jsr170/javadocs/jcr-2.0/constant-values.html#javax.jcr.PropertyType.TYPENAME_STRING">PropertyType Names</a></dd>
-<dt><a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2201">blobSize</a></dt>
-<dd>Default value 32768 (32kb)</dd>
-<dd>Size in bytes used for splitting the index files when storing them in NodeStore</dd>
-<dt>functionName</dt>
-<dd>Name to be used to enable index usage with <a href="#native-query">native query support</a></dd>
-</dl></div>
+<div class="source">
+<pre>+ sampleIndex
+    - jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
+    + analyzers
+        + default
+        + pathText
+        ...
+</pre></div>
 <div class="section">
-<h3>Property Definition<a name="Property_Definition"></a></h3>
-<p>In some cases property specific configurations are required. For example typically while performing order by in query user does not specify the property type. In such cases you need to specify the property type explicitly.</p>
-<p>Property definition nodes are created as per there property name under <tt>properties</tt> node of index definition node. For relative properties you would need to create the required path structure under <tt>properties</tt> node. For e.g. for property <tt>jcr:content/metadata/format</tt> you need to create property node at path <tt>&lt;index definition node&gt;/properties/jcr:content/jcr:lastModified</tt></p>
-
-<div class="source">
-<pre>&quot;properties&quot;:
-  {
-    &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
-    &quot;jcr:content&quot;:
-    {
-      &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
-      &quot;jcr:lastModified&quot;:
-      {
-        &quot;jcr:primaryType&quot;:&quot;oak:Unstructured&quot;,
-        &quot;type&quot;:&quot;Date&quot;
-      }
-    }
-  }	
+<h5>Specify analyzer class directly<a name="Specify_analyzer_class_directly"></a></h5>
+<p>If any of the out of the box analyzer is to be used then it can configured directly</p>
+
+<div class="source">
+<pre>+ analyzers
+        + default
+            - class = &quot;org.apache.lucene.analysis.standard.StandardAnalyzer&quot;
+            - luceneMatchVersion = &quot;LUCENE_47&quot; (optional)
 </pre></div>
+<p>To confirm to specific version specify it via <tt>luceneMatchVersion</tt> otherwise Oak would use a default version depending on version of Lucene it is shipped with.</p>
+<p>One can also provide a stopword file via <tt>stopwords</tt> <tt>nt:file</tt> node under the analyzer node</p>
 
-<dl>
-<dt>type</dt>
-<dd>JCR Property type. Can be one of <tt>Date</tt>, <tt>Boolean</tt>, <tt>Double</tt> or <tt>Long</tt></dd>
-<dt>boost</dt>
-<dd>The boost value. Defaults to 1.0</dd>
-<dd>Since 1.0.9</dd>
-</dl></div>
-<div class="section">
-<h3>Ordering<a name="Ordering"></a></h3>
-<p>Lucene property index provides efficient sorting support based on Lucene DocValue fields. To configure specify the list of property names which can be used in the <tt>order by</tt> clause as part of <tt>orderedProps</tt> property.</p>
-<p>If the property is of type other than string then you must specify the property definition with <tt>type</tt> details</p>
-<p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2196">Lucene based Sorting</a> for more details. </p>
-<p><a name="osgi-config"></a></p></div>
+<div class="source">
+<pre>+ analyzers
+        + default
+            - class = &quot;org.apache.lucene.analysis.standard.StandardAnalyzer&quot;
+            - luceneMatchVersion = &quot;LUCENE_47&quot; (optional)
+            + stopwords (nt:file)
+</pre></div></div>
+<div class="section">
+<h5>Create analyzer via composition<a name="Create_analyzer_via_composition"></a></h5>
+<p>Analyzers can also be composed based on <tt>Tokenizers</tt>, <tt>TokenFilters</tt> and <tt>CharFilters</tt>. This is similar to the support provided in Solr where you can <a class="externalLink" href="https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Specifying_an_Analyzer_in_the_schema">configure analyzers in xml</a></p>
+
+<div class="source">
+<pre>+ analyzers
+        + default
+            + charFilters (nt:unstructured) //The filters needs to be ordered
+                + HTMLStrip
+                + Mapping
+            + tokenizer
+                - name = &quot;Standard&quot;
+            + filters (nt:unstructured) //The filters needs to be ordered
+                + LowerCase
+                + Stop
+                    - stopWordFiles = &quot;stop1.txt, stop2.txt&quot;
+                    + stop1.txt (nt:file)
+                    + stop2.txt (nt:file)
+                + PorterStem
+</pre></div>
+<p>Points to note</p>
+
+<ol style="list-style-type: decimal">
+  
+<li>Name of filters, charFilters and tokenizer are formed by removing the  factory suffixes. So
+  
+<ul>
+    
+<li>org.apache.lucene.analysis.standard.StandardTokenizerFactory -&gt; standard</li>
+    
+<li>org.apache.lucene.analysis.charfilter.MappingCharFilterFactory -&gt; Mapping</li>
+    
+<li>org.apache.lucene.analysis.core.StopFilterFactory -&gt; Stop</li>
+  </ul></li>
+  
+<li>Any config parameter required for the factory is specified as property of  that node
+  
+<ul>
+    
+<li>If the factory requires to load a file e.g. stop words from some file then  file content can be provided via creating child <tt>nt:file</tt> node of the  filename</li>
+  </ul></li>
+</ol>
+<p><a name="osgi-config"></a></p></div></div></div>
 <div class="section">
 <h3>LuceneIndexProvider Configuration<a name="LuceneIndexProvider_Configuration"></a></h3>
 <p>Some of the runtime aspects of the Oak Lucene support can be configured via OSGi configuration. The configuration needs to be done for PID <tt>org.apache
@@ -668,12 +881,10 @@ order by
 <p>For example for assetIndex definition like </p>
 
 <div class="source">
-<pre>{
-  &quot;jcr:primaryType&quot;:&quot;oak:QueryIndexDefinition&quot;,
-  &quot;type&quot;:&quot;lucene&quot;,
-  ...
-  &quot;functionName&quot; : &quot;lucene-assetIndex&quot;,
-}
+<pre>- jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
+- type = &quot;lucene&quot;
+...
+- functionName = &quot;lucene-assetIndex&quot;
 </pre></div>
 <p>Executing following query would ensure that Lucene index from <tt>assetIndex</tt> should be used</p>
 
@@ -685,13 +896,11 @@ order by
 <p>By default Lucene indexes are stored in the <tt>NodeStore</tt>. If required they can be stored on the file system directly</p>
 
 <div class="source">
-<pre>{
-  &quot;jcr:primaryType&quot;:&quot;oak:QueryIndexDefinition&quot;,
-  &quot;type&quot;:&quot;lucene&quot;,
-  ...
-  &quot;persistence&quot; : &quot;file&quot;,
-  &quot;path&quot; : &quot;/path/to/store/index&quot;
-}
+<pre>- jcr:primaryType = &quot;oak:QueryIndexDefinition&quot;
+- type = &quot;lucene&quot;
+...
+- persistence = &quot;file&quot;
+- path = &quot;/path/to/store/index&quot;
 </pre></div>
 <p>To store the Lucene index in the file system, in the Lucene index definition node, set the property <tt>persistence</tt> to <tt>file</tt>, and set the property <tt>path</tt> to the directory where the index should be stored. Then start reindexing by setting <tt>reindex</tt> to <tt>true</tt>.</p>
 <p>Note that this setup would only for those non cluster <tt>NodeStore</tt>. If the backend <tt>NodeStore</tt> supports clustering then index data would not be accessible on other cluster nodes</p>



Mime
View raw message