jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chet...@apache.org
Subject svn commit: r1802113 - /jackrabbit/site/live/oak/docs/query/oak-run-indexing.html
Date Mon, 17 Jul 2017 09:13:16 GMT
Author: chetanm
Date: Mon Jul 17 09:13:16 2017
New Revision: 1802113

URL: http://svn.apache.org/viewvc?rev=1802113&view=rev
Added toc


Modified: jackrabbit/site/live/oak/docs/query/oak-run-indexing.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/oak-run-indexing.html?rev=1802113&r1=1802112&r2=1802113&view=diff
--- jackrabbit/site/live/oak/docs/query/oak-run-indexing.html (original)
+++ jackrabbit/site/live/oak/docs/query/oak-run-indexing.html Mon Jul 17 09:13:16 2017
@@ -9,7 +9,7 @@
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="Date-Revision-yyyymmdd" content="20170717" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Jackrabbit Oak &#x2013; Oak Run Indexing</title>
+    <title>Jackrabbit Oak &#x2013; <a name="oak-run-indexing"></a>
Oak Run Indexing</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.6.min.css" />
     <link rel="stylesheet" href="../css/site.css" />
     <link rel="stylesheet" href="../css/print.css" media="print" />
@@ -229,7 +229,63 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
-  --><h1>Oak Run Indexing</h1>
+  --><h1><a name="oak-run-indexing"></a> Oak Run Indexing</h1>
+<li><a href="#oak-run-indexing">Oak Run Indexing</a>
+<li><a href="#common-options">Common Options</a></li>
+<li><a href="#index-info">Generate Index Info</a></li>
+<li><a href="#dump-index-defn">Dump Index Definitions</a></li>
+<li><a href="#async-index-data">Dump Index Data</a></li>
+<li><a href="#check-index">Index Consistency Check</a></li>
+<li><a href="#reindex">Reindex</a>
+<li><a href="#out-of-band-indexing">A - out-of-band indexing</a>
+<li><a href="#out-of-band-pre-extraction">Step 1 - Text PreExtraction</a></li>
+<li><a href="#out-of-band-create-checkpoint">Step 2 - Create Checkpoint</a></li>
+<li><a href="#out-of-band-perform-reindex">Step 3 - Perform Reindex</a></li>
+<li><a href="#out-of-band-import-reindex">Step 4 - Import the index</a>
+<li><a href="#import-index-oak-run">4.1 - Via oak-run</a></li>
+<li><a href="#import-index-mbean">4.2 - Via IndexerMBean</a></li>
+<li><a href="#import-index-script">4.3 - Via script</a></li>
+        </ul></li>
+      </ul></li>
+<li><a href="#online-indexing">B - Online indexing</a>
+<li><a href="#online-indexing-pre-extract">Step 1 - Text PreExtraction</a></li>
+<li><a href="#online-indexing-perform-reindex">Step 2 - Perform reindexing</a></li>
+      </ul></li>
+<li><a href="#tika-setup">Tika Setup</a></li>
+    </ul></li>
+  </ul></li>
 <p><tt>@since Oak 1.7.0</tt></p>
 <p><b>Work in progress. Not to be used on production setups</b></p>
 <p>With Oak 1.7 we have added some tooling as part of oak-run <tt>index</tt>
command. Below are details around various operations supported by this command.</p>
@@ -237,7 +293,7 @@
 <p>By default the tool would generate output file in directory <tt>indexing-result</tt>
which is referred to as output directory.</p>
 <p>Unless specified all operations connect to the repository in read only mode</p>
 <div class="section">
-<h2><a name="Common_Options"></a>Common Options</h2>
+<h2><a name="Common_Options"></a><a name="common-options"></a>
Common Options</h2>
 <p>All the commands support following common options</p>
 <ol style="list-style-type: decimal">
@@ -246,7 +302,7 @@
 <p>Also refer to help output via <tt>-h</tt> command for some other options</p></div>
 <div class="section">
-<h2><a name="Generate_Index_Info"></a>Generate Index Info</h2>
+<h2><a name="Generate_Index_Info"></a><a name="index-info"></a>
Generate Index Info</h2>
 <div class="source">
 <div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore
 /path/to/segmentstore/ --index-info 
@@ -254,7 +310,7 @@
 <p>Generates a report consisting of various stats related to indexes present in the
given repository. The generated report is stored by default in <tt>&lt;output dir&gt;/index-info.txt</tt></p>
 <p>Supported for all index types</p></div>
 <div class="section">
-<h2><a name="Dump_Index_Definitions"></a>Dump Index Definitions</h2>
+<h2><a name="Dump_Index_Definitions"></a><a name="dump-index-defn"></a>
Dump Index Definitions</h2>
 <div class="source">
 <div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore
 /path/to/segmentstore/ --index-definitions
@@ -262,7 +318,7 @@
 <p><tt>--index-definitions</tt> operation dumps the index definition in
json format to a file <tt>&lt;output dir&gt;/index-definitions.json</tt>.
The json file contains index definitions keyed against the index paths</p>
 <p>Supported for all index types</p></div>
 <div class="section">
-<h2><a name="Dump_Index_Data"></a>Dump Index Data</h2>
+<h2><a name="Dump_Index_Data"></a><a name="async-index-data"></a>
Dump Index Data</h2>
 <div class="source">
 <div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore
 /path/to/segmentstore/ --index-dump
@@ -270,7 +326,7 @@
 <p><tt>--index-dump</tt> operation dumps the index content in output directory.
The output directory would contain one folder for each index. Each folder would have a property
file <tt>index-details.txt</tt> which contains <tt>indexPath</tt></p>
 <p>Supported for only Lucene indexes.</p></div>
 <div class="section">
-<h2><a name="Index_Consistency_Check"></a>Index Consistency Check</h2>
+<h2><a name="Index_Consistency_Check"></a><a name="check-index"></a>
Index Consistency Check</h2>
 <div class="source">
 <div class="source"><pre class="prettyprint">java -jar oak-run*.jar index --fds-path=/path/to/datastore
 /path/to/segmentstore/ --index-consistency-check
@@ -286,7 +342,7 @@
 <p>It would generate a report in <tt>&lt;output dir&gt;/index-consistency-check-report.txt</tt></p>
 <p>Supported for only Lucene indexes.</p></div>
 <div class="section">
-<h2><a name="Reindex"></a>Reindex</h2>
+<h2><a name="Reindex"></a><a name="reindex"></a> Reindex</h2>
 <p>The reindex operation supports 2 modes of index</p>
@@ -298,7 +354,7 @@
 <p>Supported for only Lucene indexes.</p>
 <p>If the indexes being reindex have fulltext indexing enabled then refer to <a
href="#tika-setup">Tika Setup</a> for steps on how to adapt the command to include
Tika support for text extraction</p>
 <div class="section">
-<h3><a name="A_-_out-of-band_indexing"></a>A - out-of-band indexing</h3>
+<h3><a name="A_-_out-of-band_indexing"></a><a name="out-of-band-indexing"></a>
A - out-of-band indexing</h3>
 <p>Out of band indexing has following phases</p>
 <ol style="list-style-type: decimal">
@@ -312,13 +368,13 @@
 <li>Complete the increment indexing from checkpoint state to current head</li>
 <div class="section">
-<h4><a name="Step_1_-_Text_PreExtraction"></a>Step 1 - Text PreExtraction</h4>
+<h4><a name="Step_1_-_Text_PreExtraction"></a><a name="out-of-band-pre-extraction"></a>
Step 1 - Text PreExtraction</h4>
 <p>If the index being reindexed involves fulltext index and the repository has binary
content then its recommended that first <a href="pre-extract-text.html">text pre-extraction</a>
is performed. This ensures that costly operation around text extraction is done prior to actual
indexing so that actual indexing does not do text extraction in critical path</p></div>
 <div class="section">
-<h4><a name="Step_2_-_Create_Checkpoint"></a>Step 2 - Create Checkpoint</h4>
+<h4><a name="Step_2_-_Create_Checkpoint"></a><a name="out-of-band-create-checkpoint"></a>Step
2 - Create Checkpoint</h4>
 <p>Go to <tt>CheckpointMBean</tt> and create a checkpoint with lifetime
of 1 month. &#xab;TBD&#xbb;</p></div>
 <div class="section">
-<h4><a name="Step_3_-_Perform_Reindex"></a>Step 3 - Perform Reindex</h4>
+<h4><a name="Step_3_-_Perform_Reindex"></a><a name="out-of-band-perform-reindex"></a>
Step 3 - Perform Reindex</h4>
 <p>In this step we perform the actual indexing via oak-run where it connects to repository
in read only mode. </p>
 <div class="source">
@@ -335,10 +391,10 @@
 <li><tt>--checkpoint</tt> - The checkpoint up to which the index is updated,
when indexing in read only mode. For  testing purpose, it can be set to &#x2018;head&#x2019;
to indicate that the head state should be used.</li>
 <div class="section">
-<h4><a name="Step_4_-_Import_the_index"></a>Step 4 - Import the index</h4>
+<h4><a name="Step_4_-_Import_the_index"></a><a name="out-of-band-import-reindex"></a>Step
4 - Import the index</h4>
 <p>As a last step we need to import the index back in the repository. This can be done
in one of the following ways</p>
 <div class="section">
-<h5><a name="a4.1_-_Via_oak-run"></a>4.1 - Via oak-run</h5>
+<h5><a name="a4.1_-_Via_oak-run"></a><a name="import-index-oak-run"></a>4.1
- Via oak-run</h5>
 <p>In this mode we import the index using oak-run</p>
 <div class="source">
@@ -347,20 +403,20 @@
 <p>Here &#x201c;index dir&#x201d; is the directory which contains the index
files created in step #3. Check the logs from previous command for the directory path.</p>
 <p>This mode should only be used when repository is from Oak version 1.7+ as oak-run
connects to the repository in read-write mode.</p></div>
 <div class="section">
-<h5><a name="a4.2_-_Via_IndexerMBean"></a>4.2 - Via IndexerMBean</h5>
+<h5><a name="a4.2_-_Via_IndexerMBean"></a><a name="import-index-mbean"></a>4.2
- Via IndexerMBean</h5>
 <p>In this mode we import the index using JMX. Looks for <tt>IndexerMBean</tt>
and then import the index directory using the <tt>importIndex</tt> operation</p></div>
 <div class="section">
-<h5><a name="a4.3_-_Via_script"></a>4.3 - Via script</h5>
+<h5><a name="a4.3_-_Via_script"></a><a name="import-index-script"></a>4.3
- Via script</h5>
 <p>TODO - Provide a way to import the data on older setup using some script</p></div></div></div>
 <div class="section">
-<h3><a name="B_-_Online_indexing"></a>B - Online indexing</h3>
+<h3><a name="B_-_Online_indexing"></a><a name="online-indexing"></a>B
- Online indexing</h3>
 <p>Online indexing automates some of the manual steps which are required for out-of-band
indexing. </p>
 <p>This mode should only be used when repository is from Oak version 1.7+ as oak-run
connects to the repository in read-write mode.</p>
 <div class="section">
-<h4><a name="Step_1_-_Text_PreExtraction"></a>Step 1 - Text PreExtraction</h4>
+<h4><a name="Step_1_-_Text_PreExtraction"></a><a name="online-indexing-pre-extract"></a>Step
1 - Text PreExtraction</h4>
 <p>This is same as in out-of-band indexing</p></div>
 <div class="section">
-<h4><a name="Step_2_-_Perform_reindexing"></a>Step 2 - Perform reindexing</h4>
+<h4><a name="Step_2_-_Perform_reindexing"></a><a name="online-indexing-perform-reindex"></a>Step
2 - Perform reindexing</h4>
 <p>In this step we configure oak-run to connect to repository in read-write mode and
let it perform all other steps i.e checkpoint creation, indexing and import</p>
 <div class="source">

View raw message