knox-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From m...@apache.org
Subject svn commit: r1850181 [4/13] - in /knox: site/books/knox-1-3-0/ site/books/knox-1-3-0/adminui/ trunk/books/1.3.0/ trunk/books/1.3.0/dev-guide/ trunk/books/1.3.0/img/ trunk/books/1.3.0/img/adminui/
Date Wed, 02 Jan 2019 17:31:31 GMT
Added: knox/site/books/knox-1-3-0/user-guide.html
URL: http://svn.apache.org/viewvc/knox/site/books/knox-1-3-0/user-guide.html?rev=1850181&view=auto
==============================================================================
--- knox/site/books/knox-1-3-0/user-guide.html (added)
+++ knox/site/books/knox-1-3-0/user-guide.html Wed Jan  2 17:31:29 2019
@@ -0,0 +1,9074 @@
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<link href="book.css" rel="stylesheet"/>
+<img src="knox-logo.gif" alt="Knox"/>
+<!-- <img src="apache-logo.gif" alt="Apache"/> -->
+<img src="apache-logo.gif" align="right" alt="Apache"/>
+<h1><a id="Apache+Knox+Gateway+1.3.x+User's+Guide">Apache Knox Gateway 1.3.x User&rsquo;s Guide</a> <a href="#Apache+Knox+Gateway+1.3.x+User's+Guide"><img src="markbook-section-link.png"/></a></h1>
+<h2><a id="Table+Of+Contents">Table Of Contents</a> <a href="#Table+Of+Contents"><img src="markbook-section-link.png"/></a></h2>
+<ul>
+  <li><a href="#Introduction">Introduction</a></li>
+  <li><a href="#Quick+Start">Quick Start</a></li>
+  <li><a href="#Gateway+Samples">Gateway Samples</a></li>
+  <li><a href="#Apache+Knox+Details">Apache Knox Details</a>
+    <ul>
+      <li><a href="#Apache+Knox+Directory+Layout">Apache Knox Directory Layout</a></li>
+      <li><a href="#Supported+Services">Supported Services</a></li>
+    </ul>
+  </li>
+  <li><a href="#Gateway+Details">Gateway Details</a>
+    <ul>
+      <li><a href="#URL+Mapping">URL Mapping</a>
+        <ul>
+          <li><a href="#Default+Topology+URLs">Default Topology URLs</a></li>
+          <li><a href="#Fully+Qualified+URLs">Fully Qualified URLs</a></li>
+          <li><a href="#Topology+Port+Mapping">Topology Port Mapping</a></li>
+        </ul>
+      </li>
+      <li><a href="#Configuration">Configuration</a>
+        <ul>
+          <li><a href="#Gateway+Server+Configuration">Gateway Server Configuration</a></li>
+          <li><a href="#Simplified+Topology+Descriptors">Simplified Topology Descriptors</a></li>
+          <li><a href="#Externalized+Provider+Configurations">Externalized Provider Configurations</a></li>
+          <li><a href="#Sharing+HA+Providers">Sharing HA Providers</a></li>
+          <li><a href="#Simplified+Descriptor+Files">Simplified Descriptor Files</a></li>
+        </ul>
+      </li>
+      <li><a href="#Cluster+Configuration+Monitoring">Cluster Configuration Monitoring</a>
+        <ul>
+          <li><a href="#Remote+Configuration+Monitor">Remote Configuration Monitor</a></li>
+          <li><a href="#Remote+Configuration+Registry+Clients">Remote Configuration Registry Clients</a></li>
+          <li><a href="#Remote+Alias+Discovery">Remote Alias Discovery</a></li>
+          <li><a href="#Topology+Descriptors">Topology Descriptors</a></li>
+          <li><a href="#Hostmap+Provider">Hostmap Provider</a></li>
+        </ul>
+      </li>
+      <li><a href="#Knox+CLI">Knox CLI</a></li>
+      <li><a href="#Admin+API">Admin API</a></li>
+      <li><a href="#X-Forwarded-*+Headers+Support">X-Forwarded-* Headers Support</a></li>
+      <li><a href="#Metrics">Metrics</a></li>
+    </ul>
+  </li>
+  <li><a href="#Authentication">Authentication</a>
+    <ul>
+      <li><a href="#Advanced+LDAP+Authentication">Advanced LDAP Authentication</a></li>
+      <li><a href="#LDAP+Authentication+Caching">LDAP Authentication Caching</a></li>
+      <li><a href="#LDAP+Group+Lookup">LDAP Group Lookup</a></li>
+      <li><a href="#PAM+based+Authentication">PAM based Authentication</a></li>
+      <li><a href="#HadoopAuth+Authentication+Provider">HadoopAuth Authentication Provider</a></li>
+      <li><a href="#Preauthenticated+SSO+Provider">Preauthenticated SSO Provider</a></li>
+      <li><a href="#SSO+Cookie+Provider">SSO Cookie Provider</a></li>
+      <li><a href="#JWT+Provider">JWT Provider</a></li>
+      <li><a href="#Pac4j+Provider+-+CAS+/+OAuth+/+SAML+/+OpenID+Connect">Pac4j Provider - CAS / OAuth / SAML / OpenID Connect</a></li>
+      <li><a href="#KnoxSSO+Setup+and+Configuration">KnoxSSO Setup and Configuration</a></li>
+      <li><a href="#KnoxToken+Configuration">KnoxToken Configuration</a></li>
+      <li><a href="#Mutual+Authentication+with+SSL">Mutual Authentication with SSL</a></li>
+    </ul>
+  </li>
+  <li><a href="#Authorization">Authorization</a></li>
+  <li><a href="#Identity+Assertion">Identity Assertion</a>
+    <ul>
+      <li><a href="#Default+Identity+Assertion+Provider">Default Identity Assertion Provider</a></li>
+      <li><a href="#Concat+Identity+Assertion+Provider">Concat Identity Assertion Provider</a></li>
+      <li><a href="#SwitchCase+Identity+Assertion+Provider">SwitchCase Identity Assertion Provider</a></li>
+      <li><a href="#Regular+Expression+Identity+Assertion+Provider">Regular Expression Identity Assertion Provider</a></li>
+      <li><a href="#Hadoop+Group+Lookup+Provider">Hadoop Group Lookup Provider</a></li>
+    </ul>
+  </li>
+  <li><a href="#Secure+Clusters">Secure Clusters</a></li>
+  <li><a href="#High+Availability">High Availability</a></li>
+  <li><a href="#Web+App+Security+Provider">Web App Security Provider</a>
+    <ul>
+      <li><a href="#CSRF">CSRF</a></li>
+      <li><a href="#CORS">CORS</a></li>
+      <li><a href="#X-Frame-Options">X-Frame-Options</a></li>
+      <li><a href="#X-Content-Type-Options">X-Content-Type-Options</a></li>
+      <li><a href="#HTTP+Strict-Transport-Security+-+HSTS">HTTP Strict-Transport-Security - HSTS</a></li>
+    </ul>
+  </li>
+  <li><a href="#Websocket+Support">Websocket Support</a></li>
+  <li><a href="#Audit">Audit</a></li>
+  <li><a href="#Client+Details">Client Details</a>
+    <ul>
+      <li><a href="#Client+Quickstart">Client Quickstart</a></li>
+      <li><a href="#Client+Token+Sessions">Client Token Sessions</a>
+        <ul>
+          <li><a href="#Server+Setup">Server Setup</a></li>
+        </ul>
+      </li>
+      <li><a href="#Client+DSL+and+SDK+Details">Client DSL and SDK Details</a></li>
+    </ul>
+  </li>
+  <li><a href="#Service+Details">Service Details</a>
+    <ul>
+      <li><a href="#WebHDFS">WebHDFS</a></li>
+      <li><a href="#WebHCat">WebHCat</a></li>
+      <li><a href="#Oozie">Oozie</a></li>
+      <li><a href="#HBase">HBase</a></li>
+      <li><a href="#Hive">Hive</a></li>
+      <li><a href="#Yarn">Yarn</a></li>
+      <li><a href="#Kafka">Kafka</a></li>
+      <li><a href="#Storm">Storm</a></li>
+      <li><a href="#Solr">Solr</a></li>
+      <li><a href="#Avatica">Avatica</a></li>
+      <li><a href="#Livy+Server">Livy Server</a></li>
+      <li><a href="#Elasticsearch">Elasticsearch</a></li>
+      <li><a href="#Common+Service+Config">Common Service Config</a></li>
+      <li><a href="#Default+Service+HA+support">Default Service HA support</a></li>
+    </ul>
+  </li>
+  <li><a href="#UI+Service+Details">UI Service Details</a></li>
+  <li><a href="#Admin+UI">Admin UI</a></li>
+  <li><a href="#Limitations">Limitations</a></li>
+  <li><a href="#Troubleshooting">Troubleshooting</a></li>
+  <li><a href="#Export+Controls">Export Controls</a></li>
+</ul>
+<h2><a id="Introduction">Introduction</a> <a href="#Introduction"><img src="markbook-section-link.png"/></a></h2>
+<p>The Apache Knox Gateway is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster. The goal is to simplify Hadoop security for both users (i.e. who access the cluster data and execute jobs) and operators (i.e. who control access and manage the cluster). The gateway runs as a server (or cluster of servers) that provide centralized access to one or more Hadoop clusters. In general the goals of the gateway are as follows:</p>
+<ul>
+  <li>Provide perimeter security for Hadoop REST APIs to make Hadoop security easier to setup and use
+    <ul>
+      <li>Provide authentication and token verification at the perimeter</li>
+      <li>Enable authentication integration with enterprise and cloud identity management systems</li>
+      <li>Provide service level authorization at the perimeter</li>
+    </ul>
+  </li>
+  <li>Expose a single URL hierarchy that aggregates REST APIs of a Hadoop cluster
+    <ul>
+      <li>Limit the network endpoints (and therefore firewall holes) required to access a Hadoop cluster</li>
+      <li>Hide the internal Hadoop cluster topology from potential attackers</li>
+    </ul>
+  </li>
+</ul>
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<h2><a id="Quick+Start">Quick Start</a> <a href="#Quick+Start"><img src="markbook-section-link.png"/></a></h2>
+<p>Here are the steps to have Apache Knox up and running against a Hadoop Cluster:</p>
+<ol>
+  <li>Verify system requirements</li>
+  <li>Download a virtual machine (VM) with Hadoop</li>
+  <li>Download Apache Knox Gateway</li>
+  <li>Start the virtual machine with Hadoop</li>
+  <li>Install Knox</li>
+  <li>Start the LDAP embedded within Knox</li>
+  <li>Start the Knox Gateway</li>
+  <li>Do Hadoop with Knox</li>
+</ol>
+<h3><a id="1+-+Requirements">1 - Requirements</a> <a href="#1+-+Requirements"><img src="markbook-section-link.png"/></a></h3>
+<h4><a id="Java">Java</a> <a href="#Java"><img src="markbook-section-link.png"/></a></h4>
+<p>Java 1.8 is required for the Knox Gateway runtime. Use the command below to check the version of Java installed on the system where Knox will be running.</p>
+<pre><code>java -version
+</code></pre>
+<h4><a id="Hadoop">Hadoop</a> <a href="#Hadoop"><img src="markbook-section-link.png"/></a></h4>
+<p>Knox 1.3.0 supports Hadoop 2.x and 3.x, the quick start instructions assume a Hadoop 2.x virtual machine based environment.</p>
+<h3><a id="2+-+Download+Hadoop+2.x+VM">2 - Download Hadoop 2.x VM</a> <a href="#2+-+Download+Hadoop+2.x+VM"><img src="markbook-section-link.png"/></a></h3>
+<p>The quick start provides a link to download Hadoop 2.0 based Hortonworks virtual machine <a href="http://hortonworks.com/products/hdp-2/#install">Sandbox</a>. Please note Knox supports other Hadoop distributions and is configurable against a full-blown Hadoop cluster. Configuring Knox for Hadoop 2.x version, or Hadoop deployed in EC2 or a custom Hadoop cluster is documented in advance deployment guide.</p>
+<h3><a id="3+-+Download+Apache+Knox+Gateway">3 - Download Apache Knox Gateway</a> <a href="#3+-+Download+Apache+Knox+Gateway"><img src="markbook-section-link.png"/></a></h3>
+<p>Download one of the distributions below from the <a href="http://www.apache.org/dyn/closer.cgi/knox">Apache mirrors</a>.</p>
+<ul>
+  <li>Source archive: <a href="http://www.apache.org/dyn/closer.cgi/knox/1.3.0/knox-1.3.0-src.zip">knox-1.3.0-src.zip</a> (<a href="http://www.apache.org/dist/knox/1.3.0/knox-1.3.0-src.zip.asc">PGP signature</a>, <a href="http://www.apache.org/dist/knox/1.3.0/knox-1.3.0-src.zip.sha1">SHA1 digest</a>, <a href="http://www.apache.org/dist/knox/1.3.0/knox-1.3.0-src.zip.md5">MD5 digest</a>)</li>
+  <li>Binary archive: <a href="http://www.apache.org/dyn/closer.cgi/knox/1.3.0/knox-1.3.0.zip">knox-1.3.0.zip</a> (<a href="http://www.apache.org/dist/knox/1.3.0/knox-1.3.0.zip.asc">PGP signature</a>, <a href="http://www.apache.org/dist/knox/1.3.0/knox-1.3.0.zip.sha1">SHA1 digest</a>, <a href="http://www.apache.org/dist/knox/1.3.0/knox-1.3.0.zip.md5">MD5 digest</a>)</li>
+</ul>
+<p>Apache Knox Gateway releases are available under the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>. See the NOTICE file contained in each release artifact for applicable copyright attribution notices.</p>
+<h3><a id="Verify">Verify</a> <a href="#Verify"><img src="markbook-section-link.png"/></a></h3>
+<p>While recommended, verification of signatures is an optional step. You can verify the integrity of any downloaded files using the PGP signatures. Please read <a href="http://httpd.apache.org/dev/verification.html">Verifying Apache HTTP Server Releases</a> for more information on why you should verify our releases.</p>
+<p>The PGP signatures can be verified using PGP or GPG. First download the <a href="https://dist.apache.org/repos/dist/release/knox/KEYS">KEYS</a> file as well as the <code>.asc</code> signature files for the relevant release packages. Make sure you get these files from the main distribution directory linked above, rather than from a mirror. Then verify the signatures using one of the methods below.</p>
+<pre><code>% pgpk -a KEYS
+% pgpv knox-1.3.0.zip.asc
+</code></pre>
+<p>or</p>
+<pre><code>% pgp -ka KEYS
+% pgp knox-1.3.0.zip.asc
+</code></pre>
+<p>or</p>
+<pre><code>% gpg --import KEYS
+% gpg --verify knox-1.3.0.zip.asc
+</code></pre>
+<h3><a id="4+-+Start+Hadoop+virtual+machine">4 - Start Hadoop virtual machine</a> <a href="#4+-+Start+Hadoop+virtual+machine"><img src="markbook-section-link.png"/></a></h3>
+<p>Start the Hadoop virtual machine.</p>
+<h3><a id="5+-+Install+Knox">5 - Install Knox</a> <a href="#5+-+Install+Knox"><img src="markbook-section-link.png"/></a></h3>
+<p>The steps required to install the gateway will vary depending upon which distribution format (zip | rpm) was downloaded. In either case you will end up with a directory where the gateway is installed. This directory will be referred to as your <code>{GATEWAY_HOME}</code> throughout this document.</p>
+<h4><a id="ZIP">ZIP</a> <a href="#ZIP"><img src="markbook-section-link.png"/></a></h4>
+<p>If you downloaded the Zip distribution you can simply extract the contents into a directory. The example below provides a command that can be executed to do this. Note the <code>{VERSION}</code> portion of the command must be replaced with an actual Apache Knox Gateway version number. This might be 1.3.0 for example.</p>
+<pre><code>unzip knox-{VERSION}.zip
+</code></pre>
+<p>This will create a directory <code>knox-{VERSION}</code> in your current directory. The directory <code>knox-{VERSION}</code> will considered your <code>{GATEWAY_HOME}</code></p>
+<h3><a id="6+-+Start+LDAP+embedded+in+Knox">6 - Start LDAP embedded in Knox</a> <a href="#6+-+Start+LDAP+embedded+in+Knox"><img src="markbook-section-link.png"/></a></h3>
+<p>Knox comes with an LDAP server for demonstration purposes. Note: If the tool used to extract the contents of the Tar or tar.gz file was not capable of making the files in the bin directory executable</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/ldap.sh start
+</code></pre>
+<h3><a id="7+-+Create+the+Master+Secret">7 - Create the Master Secret</a> <a href="#7+-+Create+the+Master+Secret"><img src="markbook-section-link.png"/></a></h3>
+<p>Run the <code>knoxcli.sh create-master</code> command in order to persist the master secret that is used to protect the key and credential stores for the gateway instance.</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/knoxcli.sh create-master
+</code></pre>
+<p>The CLI will prompt you for the master secret (i.e. password).</p>
+<h3><a id="7+-+Start+Knox">7 - Start Knox</a> <a href="#7+-+Start+Knox"><img src="markbook-section-link.png"/></a></h3>
+<p>The gateway can be started using the provided shell script.</p>
+<p>The server will discover the persisted master secret during start up and complete the setup process for demo installs. A demo install will consist of a Knox gateway instance with an identity certificate for localhost. This will require clients to be on the same machine or to turn off hostname verification. For more involved deployments, See the Knox CLI section of this document for additional configuration options, including the ability to create a self-signed certificate for a specific hostname.</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/gateway.sh start
+</code></pre>
+<p>When starting the gateway this way the process will be run in the background. The log files will be written to <code>{GATEWAY_HOME}/logs</code> and the process ID files (PIDs) will be written to <code>{GATEWAY_HOME}/pids</code>.</p>
+<p>In order to stop a gateway that was started with the script use this command:</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/gateway.sh stop
+</code></pre>
+<p>If for some reason the gateway is stopped other than by using the command above you may need to clear the tracking PID:</p>
+<pre><code>cd {GATEWAY_HOME}
+bin/gateway.sh clean
+</code></pre>
+<p><strong>NOTE: This command will also clear any <code>.out</code> and <code>.err</code> file from the <code>{GATEWAY_HOME}/logs</code> directory so use this with caution.</strong></p>
+<h3><a id="8+-+Access+Hadoop+with+Knox">8 - Access Hadoop with Knox</a> <a href="#8+-+Access+Hadoop+with+Knox"><img src="markbook-section-link.png"/></a></h3>
+<h4><a id="Invoke+the+LISTSTATUS+operation+on+WebHDFS+via+the+gateway.">Invoke the LISTSTATUS operation on WebHDFS via the gateway.</a> <a href="#Invoke+the+LISTSTATUS+operation+on+WebHDFS+via+the+gateway."><img src="markbook-section-link.png"/></a></h4>
+<p>This will return a directory listing of the root (i.e. <code>/</code>) directory of HDFS.</p>
+<pre><code>curl -i -k -u guest:guest-password -X GET \
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/?op=LISTSTATUS&#39;
+</code></pre>
+<p>The results of the above command should result in something to along the lines of the output below. The exact information returned is subject to the content within HDFS in your Hadoop cluster. Successfully executing this command at a minimum proves that the gateway is properly configured to provide access to WebHDFS. It does not necessarily mean that any of the other services are correctly configured to be accessible. To validate that see the sections for the individual services in <a href="#Service+Details">Service Details</a>.</p>
+<pre><code>HTTP/1.1 200 OK
+Content-Type: application/json
+Content-Length: 760
+Server: Jetty(6.1.26)
+
+{&quot;FileStatuses&quot;:{&quot;FileStatus&quot;:[
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595859762,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;apps&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;mapred&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595874024,&quot;owner&quot;:&quot;mapred&quot;,&quot;pathSuffix&quot;:&quot;mapred&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350596040075,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;tmp&quot;,&quot;permission&quot;:&quot;777&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;},
+{&quot;accessTime&quot;:0,&quot;blockSize&quot;:0,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:0,&quot;modificationTime&quot;:1350595857178,&quot;owner&quot;:&quot;hdfs&quot;,&quot;pathSuffix&quot;:&quot;user&quot;,&quot;permission&quot;:&quot;755&quot;,&quot;replication&quot;:0,&quot;type&quot;:&quot;DIRECTORY&quot;}
+]}}
+</code></pre>
+<h4><a id="Put+a+file+in+HDFS+via+Knox.">Put a file in HDFS via Knox.</a> <a href="#Put+a+file+in+HDFS+via+Knox."><img src="markbook-section-link.png"/></a></h4>
+<pre><code>curl -i -k -u guest:guest-password -X PUT \
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/LICENSE?op=CREATE&#39;
+
+curl -i -k -u guest:guest-password -T LICENSE -X PUT \
+    &#39;{Value of Location header from response above}&#39;
+</code></pre>
+<h4><a id="Get+a+file+in+HDFS+via+Knox.">Get a file in HDFS via Knox.</a> <a href="#Get+a+file+in+HDFS+via+Knox."><img src="markbook-section-link.png"/></a></h4>
+<pre><code>curl -i -k -u guest:guest-password -X GET \
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/LICENSE?op=OPEN&#39;
+
+curl -i -k -u guest:guest-password -X GET \
+    &#39;{Value of Location header from command response above}&#39;
+</code></pre>
+<h2><a id="Apache+Knox+Details">Apache Knox Details</a> <a href="#Apache+Knox+Details"><img src="markbook-section-link.png"/></a></h2>
+<p>This section provides everything you need to know to get the Knox gateway up and running against a Hadoop cluster.</p>
+<h4><a id="Hadoop">Hadoop</a> <a href="#Hadoop"><img src="markbook-section-link.png"/></a></h4>
+<p>An existing Hadoop 2.x or 3.x cluster is required for Knox to sit in front of and protect. It is possible to use a Hadoop cluster deployed on EC2 but this will require additional configuration not covered here. It is also possible to protect access to a services of a Hadoop cluster that is secured with Kerberos. This too requires additional configuration that is described in other sections of this guide. See <a href="#Supported+Services">Supported Services</a> for details on what is supported for this release.</p>
+<p>The instructions that follow assume a few things:</p>
+<ol>
+  <li>The gateway is <em>not</em> collocated with the Hadoop clusters themselves.</li>
+  <li>The host names and IP addresses of the cluster services are accessible by the gateway where ever it happens to be running.</li>
+</ol>
+<p>All of the instructions and samples provided here are tailored and tested to work &ldquo;out of the box&rdquo; against a <a href="https://hortonworks.com/products/sandbox/">Hortonworks Sandbox 2.x VM</a>.</p>
+<h4><a id="Apache+Knox+Directory+Layout">Apache Knox Directory Layout</a> <a href="#Apache+Knox+Directory+Layout"><img src="markbook-section-link.png"/></a></h4>
+<p>Knox can be installed by expanding the zip/archive file.</p>
+<p>The table below provides a brief explanation of the important files and directories within <code>{GATEWAY_HOME}</code></p>
+<table>
+  <thead>
+    <tr>
+      <th>Directory </th>
+      <th>Purpose </th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>conf/ </td>
+      <td>Contains configuration files that apply to the gateway globally (i.e. not cluster specific ). </td>
+    </tr>
+    <tr>
+      <td>data/ </td>
+      <td>Contains security and topology specific artifacts that require read/write access at runtime </td>
+    </tr>
+    <tr>
+      <td>conf/topologies/ </td>
+      <td>Contains topology files that represent Hadoop clusters which the gateway uses to deploy cluster proxies </td>
+    </tr>
+    <tr>
+      <td>data/security/ </td>
+      <td>Contains the persisted master secret and keystore dir </td>
+    </tr>
+    <tr>
+      <td>data/security/keystores/ </td>
+      <td>Contains the gateway identity keystore and credential stores for the gateway and each deployed cluster topology </td>
+    </tr>
+    <tr>
+      <td>data/services </td>
+      <td>Contains service behavior definitions for the services currently supported. </td>
+    </tr>
+    <tr>
+      <td>bin/ </td>
+      <td>Contains the executable shell scripts, batch files and JARs for clients and servers. </td>
+    </tr>
+    <tr>
+      <td>data/deployments/ </td>
+      <td>Contains deployed cluster topologies used to protect access to specific Hadoop clusters. </td>
+    </tr>
+    <tr>
+      <td>lib/ </td>
+      <td>Contains the JARs for all the components that make up the gateway. </td>
+    </tr>
+    <tr>
+      <td>dep/ </td>
+      <td>Contains the JARs for all of the components upon which the gateway depends. </td>
+    </tr>
+    <tr>
+      <td>ext/ </td>
+      <td>A directory where user supplied extension JARs can be placed to extends the gateways functionality. </td>
+    </tr>
+    <tr>
+      <td>pids/ </td>
+      <td>Contains the process ids for running LDAP and gateway servers </td>
+    </tr>
+    <tr>
+      <td>samples/ </td>
+      <td>Contains a number of samples that can be used to explore the functionality of the gateway. </td>
+    </tr>
+    <tr>
+      <td>templates/ </td>
+      <td>Contains default configuration files that can be copied and customized. </td>
+    </tr>
+    <tr>
+      <td>README </td>
+      <td>Provides basic information about the Apache Knox Gateway. </td>
+    </tr>
+    <tr>
+      <td>ISSUES </td>
+      <td>Describes significant know issues. </td>
+    </tr>
+    <tr>
+      <td>CHANGES </td>
+      <td>Enumerates the changes between releases. </td>
+    </tr>
+    <tr>
+      <td>LICENSE </td>
+      <td>Documents the license under which this software is provided. </td>
+    </tr>
+    <tr>
+      <td>NOTICE </td>
+      <td>Documents required attribution notices for included dependencies. </td>
+    </tr>
+  </tbody>
+</table>
+<h3><a id="Supported+Services">Supported Services</a> <a href="#Supported+Services"><img src="markbook-section-link.png"/></a></h3>
+<p>This table enumerates the versions of various Hadoop services that have been tested to work with the Knox Gateway.</p>
+<table>
+  <thead>
+    <tr>
+      <th>Service </th>
+      <th>Version </th>
+      <th>Non-Secure </th>
+      <th>Secure </th>
+      <th>HA </th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td>WebHDFS </td>
+      <td>2.4.0 </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /></td>
+    </tr>
+    <tr>
+      <td>WebHCat/Templeton </td>
+      <td>0.13.0 </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /></td>
+    </tr>
+    <tr>
+      <td>Oozie </td>
+      <td>4.0.0 </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /></td>
+    </tr>
+    <tr>
+      <td>HBase </td>
+      <td>0.98.0 </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /></td>
+    </tr>
+    <tr>
+      <td>Hive (via WebHCat) </td>
+      <td>0.13.0 </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /></td>
+    </tr>
+    <tr>
+      <td>Hive (via JDBC/ODBC) </td>
+      <td>0.13.0 </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /></td>
+    </tr>
+    <tr>
+      <td>Yarn ResourceManager </td>
+      <td>2.5.0 </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="error.png" alt="n" title="No" /></td>
+    </tr>
+    <tr>
+      <td>Kafka (via REST Proxy) </td>
+      <td>0.10.0 </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /></td>
+    </tr>
+    <tr>
+      <td>Storm </td>
+      <td>0.9.3 </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="error.png" alt="n" title="No" /> </td>
+      <td><img src="error.png" alt="n" title="No" /></td>
+    </tr>
+    <tr>
+      <td>Solr </td>
+      <td>5.5+ and 6+ </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /> </td>
+      <td><img src="check.png" alt="y" title="Yes" /></td>
+    </tr>
+  </tbody>
+</table>
+<h3><a id="More+Examples">More Examples</a> <a href="#More+Examples"><img src="markbook-section-link.png"/></a></h3>
+<p>These examples provide more detail about how to access various Apache Hadoop services via the Apache Knox Gateway.</p>
+<ul>
+  <li><a href="#WebHDFS+Examples">WebHDFS Examples</a></li>
+  <li><a href="#WebHCat+Examples">WebHCat Examples</a></li>
+  <li><a href="#Oozie+Examples">Oozie Examples</a></li>
+  <li><a href="#HBase+Examples">HBase Examples</a></li>
+  <li><a href="#Hive+Examples">Hive Examples</a></li>
+  <li><a href="#Yarn+Examples">Yarn Examples</a></li>
+  <li><a href="#Storm+Examples">Storm Examples</a></li>
+</ul>
+<h3><a id="Gateway+Samples">Gateway Samples</a> <a href="#Gateway+Samples"><img src="markbook-section-link.png"/></a></h3>
+<p>The purpose of the samples within the <code>{GATEWAY_HOME}/samples</code> directory is to demonstrate the capabilities of the Apache Knox Gateway to provide access to the numerous APIs that are available from the service components of a Hadoop cluster.</p>
+<p>Depending on exactly how your Knox installation was done, there will be some number of steps required in order fully install and configure the samples for use.</p>
+<p>This section will help describe the assumptions of the samples and the steps to get them to work in a couple of different deployment scenarios.</p>
+<h4><a id="Assumptions+of+the+Samples">Assumptions of the Samples</a> <a href="#Assumptions+of+the+Samples"><img src="markbook-section-link.png"/></a></h4>
+<p>The samples were initially written with the intent of working out of the box for the various Hadoop demo environments that are deployed as a single node cluster inside of a VM. The following assumptions were made from that context and should be understood in order to get the samples to work in other deployment scenarios:</p>
+<ul>
+  <li>That there is a valid java JDK on the PATH for executing the samples</li>
+  <li>The Knox Demo LDAP server is running on localhost and port 33389 which is the default port for the ApacheDS LDAP server.</li>
+  <li>That the LDAP directory in use has a set of demo users provisioned with the convention of username and username&ldquo;-password&rdquo; as the password. Most of the samples have some variation of this pattern with &ldquo;guest&rdquo; and &ldquo;guest-password&rdquo;.</li>
+  <li>That the Knox Gateway instance is running on the same machine which you will be running the samples from - therefore &ldquo;localhost&rdquo; and that the default port of &ldquo;8443&rdquo; is being used.</li>
+  <li>Finally, that there is a properly provisioned sandbox.xml topology in the <code>{GATEWAY_HOME}/conf/topologies</code> directory that is configured to point to the actual host and ports of running service components.</li>
+</ul>
+<h4><a id="Steps+for+Demo+Single+Node+Clusters">Steps for Demo Single Node Clusters</a> <a href="#Steps+for+Demo+Single+Node+Clusters"><img src="markbook-section-link.png"/></a></h4>
+<p>There should be little to do if anything in a demo environment that has been provisioned with illustrating the use of Apache Knox.</p>
+<p>However, the following items will be worth ensuring before you start:</p>
+<ol>
+  <li>The <code>sandbox.xml</code> topology is configured properly for the deployed services</li>
+  <li>That there is a LDAP server running with guest/guest-password user available in the directory</li>
+</ol>
+<h4><a id="Steps+for+Ambari+deployed+Knox+Gateway">Steps for Ambari deployed Knox Gateway</a> <a href="#Steps+for+Ambari+deployed+Knox+Gateway"><img src="markbook-section-link.png"/></a></h4>
+<p>Apache Knox instances that are under the management of Ambari are generally assumed not to be demo instances. These instances are in place to facilitate development, testing or production Hadoop clusters.</p>
+<p>The Knox samples can however be made to work with Ambari managed Knox instances with a few steps:</p>
+<ol>
+  <li>You need to have SSH access to the environment in order for the localhost assumption within the samples to be valid</li>
+  <li>The Knox Demo LDAP Server is started - you can start it from Ambari</li>
+  <li>The <code>default.xml</code> topology file can be copied to <code>sandbox.xml</code> in order to satisfy the topology name assumption in the samples</li>
+  <li>Be sure to use an actual Java JRE to run the sample with something like:
+    <p>/usr/jdk64/jdk1.7.0_67/bin/java -jar bin/shell.jar samples/ExampleWebHdfsLs.groovy</p>
+  </li>
+</ol>
+<h4><a id="Steps+for+a+manually+installed+Knox+Gateway">Steps for a manually installed Knox Gateway</a> <a href="#Steps+for+a+manually+installed+Knox+Gateway"><img src="markbook-section-link.png"/></a></h4>
+<p>For manually installed Knox instances, there is really no way for the installer to know how to configure the topology file for you.</p>
+<p>Essentially, these steps are identical to the Ambari deployed instance except that #3 should be replaced with the configuration of the out of the box <code>sandbox.xml</code> to point the configuration at the proper hosts and ports.</p>
+<ol>
+  <li>You need to have SSH access to the environment in order for the localhost assumption within the samples to be valid.</li>
+  <li>The Knox Demo LDAP Server is started - you can start it from Ambari</li>
+  <li>Change the hosts and ports within the <code>{GATEWAY_HOME}/conf/topologies/sandbox.xml</code> to reflect your actual cluster service locations.</li>
+  <li>Be sure to use an actual Java JRE to run the sample with something like:
+    <p>/usr/jdk64/jdk1.7.0_67/bin/java -jar bin/shell.jar samples/ExampleWebHdfsLs.groovy</p>
+  </li>
+</ol>
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<h2><a id="Gateway+Details">Gateway Details</a> <a href="#Gateway+Details"><img src="markbook-section-link.png"/></a></h2>
+<p>This section describes the details of the Knox Gateway itself. Including:</p>
+<ul>
+  <li>How URLs are mapped between a gateway that services multiple Hadoop clusters and the clusters themselves</li>
+  <li>How the gateway is configured through <code>gateway-site.xml</code> and cluster specific topology files</li>
+  <li>How to configure the various policy enforcement provider features such as authentication, authorization, auditing, hostmapping, etc.</li>
+</ul>
+<h3><a id="URL+Mapping">URL Mapping</a> <a href="#URL+Mapping"><img src="markbook-section-link.png"/></a></h3>
+<p>The gateway functions much like a reverse proxy. As such, it maintains a mapping of URLs that are exposed externally by the gateway to URLs that are provided by the Hadoop cluster.</p>
+<h4><a id="Default+Topology+URLs">Default Topology URLs</a> <a href="#Default+Topology+URLs"><img src="markbook-section-link.png"/></a></h4>
+<p>In order to provide compatibility with the Hadoop Java client and existing CLI tools, the Knox Gateway has provided a feature called the <em>Default Topology</em>. This refers to a topology deployment that will be able to route URLs without the additional context that the gateway uses for differentiating from one Hadoop cluster to another. This allows the URLs to match those used by existing clients that may access WebHDFS through the Hadoop file system abstraction.</p>
+<p>When a topology file is deployed with a file name that matches the configured default topology name, a specialized mapping for URLs is installed for that particular topology. This allows the URLs that are expected by the existing Hadoop CLIs for WebHDFS to be used in interacting with the specific Hadoop cluster that is represented by the default topology file.</p>
+<p>The configuration for the default topology name is found in <code>gateway-site.xml</code> as a property called: <code>default.app.topology.name</code>.</p>
+<p>The default value for this property is empty.</p>
+<p>When deploying the <code>sandbox.xml</code> topology and setting <code>default.app.topology.name</code> to <code>sandbox</code>, both of the following example URLs work for the same underlying Hadoop cluster:</p>
+<pre><code>https://{gateway-host}:{gateway-port}/webhdfs
+https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs
+</code></pre>
+<p>These default topology URLs exist for all of the services in the topology.</p>
+<h4><a id="Fully+Qualified+URLs">Fully Qualified URLs</a> <a href="#Fully+Qualified+URLs"><img src="markbook-section-link.png"/></a></h4>
+<p>Examples of mappings for WebHDFS, WebHCat, Oozie and HBase are shown below. These mapping are generated from the combination of the gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>) and the cluster topology descriptors (e.g. <code>{GATEWAY_HOME}/conf/topologies/{cluster-name}.xml</code>). The port numbers shown for the Cluster URLs represent the default ports for these services. The actual port number may be different for a given cluster.</p>
+<ul>
+  <li>WebHDFS
+    <ul>
+      <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs</code></li>
+      <li>Cluster: <code>http://{webhdfs-host}:50070/webhdfs</code></li>
+    </ul>
+  </li>
+  <li>WebHCat (Templeton)
+    <ul>
+      <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/templeton</code></li>
+      <li>Cluster: <code>http://{webhcat-host}:50111/templeton}</code></li>
+    </ul>
+  </li>
+  <li>Oozie
+    <ul>
+      <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/oozie</code></li>
+      <li>Cluster: <code>http://{oozie-host}:11000/oozie}</code></li>
+    </ul>
+  </li>
+  <li>HBase
+    <ul>
+      <li>Gateway: <code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/hbase</code></li>
+      <li>Cluster: <code>http://{hbase-host}:8080</code></li>
+    </ul>
+  </li>
+  <li>Hive JDBC
+    <ul>
+      <li>Gateway: <code>jdbc:hive2://{gateway-host}:{gateway-port}/;ssl=true;sslTrustStore={gateway-trust-store-path};trustStorePassword={gateway-trust-store-password};transportMode=http;httpPath={gateway-path}/{cluster-name}/hive</code></li>
+      <li>Cluster: <code>http://{hive-host}:10001/cliservice</code></li>
+    </ul>
+  </li>
+</ul>
+<p>The values for <code>{gateway-host}</code>, <code>{gateway-port}</code>, <code>{gateway-path}</code> are provided via the gateway configuration file (i.e. <code>{GATEWAY_HOME}/conf/gateway-site.xml</code>).</p>
+<p>The value for <code>{cluster-name}</code> is derived from the file name of the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/deployments/{cluster-name}.xml</code>).</p>
+<p>The value for <code>{webhdfs-host}</code>, <code>{webhcat-host}</code>, <code>{oozie-host}</code>, <code>{hbase-host}</code> and <code>{hive-host}</code> are provided via the cluster topology descriptor (e.g. <code>{GATEWAY_HOME}/conf/topologies/{cluster-name}.xml</code>).</p>
+<p>Note: The ports 50070 (9870 for Hadoop 3.x), 50111, 11000, 8080 and 10001 are the defaults for WebHDFS, WebHCat, Oozie, HBase and Hive respectively. Their values can also be provided via the cluster topology descriptor if your Hadoop cluster uses different ports.</p>
+<p>Note: The HBase REST API uses port 8080 by default. This often clashes with other running services. In the Hortonworks Sandbox, Apache Ambari might be running on this port, so you might have to change it to a different port (e.g. 60080).</p>
+<h4><a id="Topology+Port+Mapping">Topology Port Mapping</a> <a href="#Topology+Port+Mapping"><img src="markbook-section-link.png"/></a></h4>
+<p>This feature allows mapping of a topology to a port, as a result one can have a specific topology listening on a configured port. This feature routes URLs to these port-mapped topologies without the additional context that the gateway uses for differentiating from one Hadoop cluster to another, just like the <a href="#Default+Topology+URLs">Default Topology URLs</a> feature, but on a dedicated port. </p>
+<p>The configuration for Topology Port Mapping goes in <code>gateway-site.xml</code> file. The configuration uses the property name and value model to configure the settings for this feature. The format for the property name is <code>gateway.port.mapping.{topologyName}</code> and value is the port number that this topology would listen on. </p>
+<p>In the following example, the topology <code>development</code> will listen on 9443 (if the port is not already taken).</p>
+<pre><code>  &lt;property&gt;
+      &lt;name&gt;gateway.port.mapping.development&lt;/name&gt;
+      &lt;value&gt;9443&lt;/value&gt;
+      &lt;description&gt;Topology and Port mapping&lt;/description&gt;
+  &lt;/property&gt;
+</code></pre>
+<p>An example of how one can access WebHDFS URL using the above configuration is</p>
+<pre><code> https://{gateway-host}:9443/webhdfs
+ https://{gateway-host}:9443/{gateway-path}/development/webhdfs
+ https://{gateway-host}:{gateway-port}/{gateway-path}/development/webhdfs
+</code></pre>
+<p>All of the above URL will be valid URLs for the above described configuration.</p>
+<p>This feature is turned on by default, to turn it off use the property <code>gateway.port.mapping.enabled</code>. e.g.</p>
+<pre><code> &lt;property&gt;
+     &lt;name&gt;gateway.port.mapping.enabled&lt;/name&gt;
+     &lt;value&gt;false&lt;/value&gt;
+     &lt;description&gt;Enable/Disable port mapping feature.&lt;/description&gt;
+ &lt;/property&gt;
+</code></pre>
+<p>If a topology mapped port is in use by another topology or process then an ERROR message is logged and gateway startup continues as normal.</p>
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+<h3><a id="Configuration">Configuration</a> <a href="#Configuration"><img src="markbook-section-link.png"/></a></h3>
+<p>Configuration for Apache Knox includes:</p>
+<ol>
+  <li><a href="#Related+Cluster+Configuration">Related Cluster Configuration</a> that must be done within the Hadoop cluster to allow Knox to communicate with various services</li>
+  <li><a href="#Gateway+Server+Configuration">Gateway Server Configuration</a> - which is the configurable elements of the server itself which applies to behavior that spans all topologies or managed Hadoop clusters</li>
+  <li><a href="#Topology+Descriptors">Topology Descriptors</a> which are the descriptors for controlling access to Hadoop clusters in various ways</li>
+</ol>
+<h3><a id="Related+Cluster+Configuration">Related Cluster Configuration</a> <a href="#Related+Cluster+Configuration"><img src="markbook-section-link.png"/></a></h3>
+<p>The following configuration changes must be made to your cluster to allow Apache Knox to dispatch requests to the various service components on behalf of end users.</p>
+<h4><a id="Grant+Proxy+privileges+for+Knox+user+in+`core-site.xml`+on+Hadoop+master+nodes">Grant Proxy privileges for Knox user in <code>core-site.xml</code> on Hadoop master nodes</a> <a href="#Grant+Proxy+privileges+for+Knox+user+in+`core-site.xml`+on+Hadoop+master+nodes"><img src="markbook-section-link.png"/></a></h4>
+<p>Update <code>core-site.xml</code> and add the following lines towards the end of the file.</p>
+<p>Replace <code>FQDN_OF_KNOX_HOST</code> with the fully qualified domain name of the host running the Knox gateway. You can usually find this by running <code>hostname -f</code> on that host.</p>
+<p>You can use <code>*</code> for local developer testing if the Knox host does not have a static IP.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;hadoop.proxyuser.knox.groups&lt;/name&gt;
+    &lt;value&gt;users&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+    &lt;name&gt;hadoop.proxyuser.knox.hosts&lt;/name&gt;
+    &lt;value&gt;FQDN_OF_KNOX_HOST&lt;/value&gt;
+&lt;/property&gt;
+</code></pre>
+<h4><a id="Grant+proxy+privilege+for+Knox+in+`webhcat-site.xml`+on+Hadoop+master+nodes">Grant proxy privilege for Knox in <code>webhcat-site.xml</code> on Hadoop master nodes</a> <a href="#Grant+proxy+privilege+for+Knox+in+`webhcat-site.xml`+on+Hadoop+master+nodes"><img src="markbook-section-link.png"/></a></h4>
+<p>Update <code>webhcat-site.xml</code> and add the following lines towards the end of the file.</p>
+<p>Replace <code>FQDN_OF_KNOX_HOST</code> with the fully qualified domain name of the host running the Knox gateway. You can use <code>*</code> for local developer testing if the Knox host does not have a static IP.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;webhcat.proxyuser.knox.groups&lt;/name&gt;
+    &lt;value&gt;users&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+    &lt;name&gt;webhcat.proxyuser.knox.hosts&lt;/name&gt;
+    &lt;value&gt;FQDN_OF_KNOX_HOST&lt;/value&gt;
+&lt;/property&gt;
+</code></pre>
+<h4><a id="Grant+proxy+privilege+for+Knox+in+`oozie-site.xml`+on+Oozie+host">Grant proxy privilege for Knox in <code>oozie-site.xml</code> on Oozie host</a> <a href="#Grant+proxy+privilege+for+Knox+in+`oozie-site.xml`+on+Oozie+host"><img src="markbook-section-link.png"/></a></h4>
+<p>Update <code>oozie-site.xml</code> and add the following lines towards the end of the file.</p>
+<p>Replace <code>FQDN_OF_KNOX_HOST</code> with the fully qualified domain name of the host running the Knox gateway. You can use <code>*</code> for local developer testing if the Knox host does not have a static IP.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;oozie.service.ProxyUserService.proxyuser.knox.groups&lt;/name&gt;
+    &lt;value&gt;users&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+    &lt;name&gt;oozie.service.ProxyUserService.proxyuser.knox.hosts&lt;/name&gt;
+    &lt;value&gt;FQDN_OF_KNOX_HOST&lt;/value&gt;
+&lt;/property&gt;
+</code></pre>
+<h4><a id="Enable+http+transport+mode+and+use+substitution+in+HiveServer2">Enable http transport mode and use substitution in HiveServer2</a> <a href="#Enable+http+transport+mode+and+use+substitution+in+HiveServer2"><img src="markbook-section-link.png"/></a></h4>
+<p>Update <code>hive-site.xml</code> and set the following properties on HiveServer2 hosts. Some of the properties may already be in the hive-site.xml. Ensure that the values match the ones below.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;hive.server2.allow.user.substitution&lt;/name&gt;
+    &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;hive.server2.transport.mode&lt;/name&gt;
+    &lt;value&gt;http&lt;/value&gt;
+    &lt;description&gt;Server transport mode. &quot;binary&quot; or &quot;http&quot;.&lt;/description&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;hive.server2.thrift.http.port&lt;/name&gt;
+    &lt;value&gt;10001&lt;/value&gt;
+    &lt;description&gt;Port number when in HTTP mode.&lt;/description&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;hive.server2.thrift.http.path&lt;/name&gt;
+    &lt;value&gt;cliservice&lt;/value&gt;
+    &lt;description&gt;Path component of URL endpoint when in HTTP mode.&lt;/description&gt;
+&lt;/property&gt;
+</code></pre>
+<h4><a id="Gateway+Server+Configuration">Gateway Server Configuration</a> <a href="#Gateway+Server+Configuration"><img src="markbook-section-link.png"/></a></h4>
+<p>The following table illustrates the configurable elements of the Apache Knox Gateway at the server level via gateway-site.xml.</p>
+<table>
+  <thead>
+    <tr>
+      <th>Property </th>
+      <th>Description </th>
+      <th>Default</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><code>gateway.deployment.dir</code></td>
+      <td>The directory within <code>GATEWAY_HOME</code> that contains gateway topology deployments</td>
+      <td><code>{GATEWAY_HOME}/data/deployments</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.security.dir</code></td>
+      <td>The directory within <code>GATEWAY_HOME</code> that contains the required security artifacts</td>
+      <td><code>{GATEWAY_HOME}/data/security</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.data.dir</code></td>
+      <td>The directory within <code>GATEWAY_HOME</code> that contains the gateway instance data</td>
+      <td><code>{GATEWAY_HOME}/data</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.services.dir</code></td>
+      <td>The directory within <code>GATEWAY_HOME</code> that contains the gateway services definitions</td>
+      <td><code>{GATEWAY_HOME}/services</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.hadoop.conf.dir</code></td>
+      <td>The directory within <code>GATEWAY_HOME</code> that contains the gateway configuration</td>
+      <td><code>{GATEWAY_HOME}/conf</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.frontend.url</code></td>
+      <td>The URL that should be used during rewriting so that it can rewrite the URLs with the correct &ldquo;frontend&rdquo; URL</td>
+      <td>none</td>
+    </tr>
+    <tr>
+      <td><code>gateway.xforwarded.enabled</code></td>
+      <td>Indicates whether support for some X-Forwarded-* headers is enabled</td>
+      <td><code>true</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.trust.all.certs</code></td>
+      <td>Indicates whether all presented client certs should establish trust</td>
+      <td><code>false</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.client.auth.needed</code></td>
+      <td>Indicates whether clients are required to establish a trust relationship with client certificates</td>
+      <td><code>false</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.truststore.path</code></td>
+      <td>Location of the truststore for client certificates to be trusted</td>
+      <td><code>gateway.jks</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.truststore.type</code></td>
+      <td>Indicates the type of truststore</td>
+      <td><code>JKS</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.keystore.type</code></td>
+      <td>Indicates the type of keystore for the identity store</td>
+      <td><code>JKS</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.jdk.tls.ephemeralDHKeySize</code></td>
+      <td><code>jdk.tls.ephemeralDHKeySize</code>, is defined to customize the ephemeral DH key sizes. The minimum acceptable DH key size is 1024 bits, except for exportable cipher suites or legacy mode (<code>jdk.tls.ephemeralDHKeySize=legacy</code>)</td>
+      <td><code>2048</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.threadpool.max</code></td>
+      <td>The maximum concurrent requests the server will process. The default is 254. Connections beyond this will be queued.</td>
+      <td><code>254</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.httpclient.maxConnections</code></td>
+      <td>The maximum number of connections that a single HttpClient will maintain to a single host:port.</td>
+      <td><code>32</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.httpclient.connectionTimeout</code></td>
+      <td>The amount of time to wait when attempting a connection. The natural unit is milliseconds, but a &lsquo;s&rsquo; or &lsquo;m&rsquo; suffix may be used for seconds or minutes respectively.</td>
+      <td>20s</td>
+    </tr>
+    <tr>
+      <td><code>gateway.httpclient.socketTimeout</code></td>
+      <td>The amount of time to wait for data on a socket before aborting the connection. The natural unit is milliseconds, but a &lsquo;s&rsquo; or &lsquo;m&rsquo; suffix may be used for seconds or minutes respectively.</td>
+      <td>20s</td>
+    </tr>
+    <tr>
+      <td><code>gateway.httpserver.requestBuffer</code></td>
+      <td>The size of the HTTP server request buffer in bytes</td>
+      <td><code>16384</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.httpserver.requestHeaderBuffer</code></td>
+      <td>The size of the HTTP server request header buffer in bytes</td>
+      <td><code>8192</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.httpserver.responseBuffer</code></td>
+      <td>The size of the HTTP server response buffer in bytes</td>
+      <td><code>32768</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.httpserver.responseHeaderBuffer</code></td>
+      <td>The size of the HTTP server response header buffer in bytes</td>
+      <td><code>8192</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.websocket.feature.enabled</code></td>
+      <td>Enable/Disable WebSocket feature</td>
+      <td><code>false</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.signing.keystore.name</code></td>
+      <td>OPTIONAL Filename of keystore file that contains the signing keypair. NOTE: An alias needs to be created using <code>knoxcli.sh create-alias</code> for the alias name <code>signing.key.passphrase</code> in order to provide the passphrase to access the keystore.</td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td><code>gateway.signing.key.alias</code></td>
+      <td>OPTIONAL alias for the signing keypair within the keystore specified via <code>gateway.signing.keystore.name</code></td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td><code>ssl.enabled</code></td>
+      <td>Indicates whether SSL is enabled for the Gateway</td>
+      <td><code>true</code></td>
+    </tr>
+    <tr>
+      <td><code>ssl.include.ciphers</code></td>
+      <td>A comma or pipe separated list of ciphers to accept for SSL. See the <a href="http://docs.oracle.com/javase/8/docs/technotes/guides/security/SunProviders.html#SunJSSEProvider">JSSE Provider docs</a> for possible ciphers. These can also contain regular expressions as shown in the <a href="http://www.eclipse.org/jetty/documentation/current/configuring-ssl.html">Jetty documentation</a>.</td>
+      <td>all</td>
+    </tr>
+    <tr>
+      <td><code>ssl.exclude.ciphers</code></td>
+      <td>A comma or pipe separated list of ciphers to reject for SSL. See the <a href="http://docs.oracle.com/javase/8/docs/technotes/guides/security/SunProviders.html#SunJSSEProvider">JSSE Provider docs</a> for possible ciphers. These can also contain regular expressions as shown in the <a href="http://www.eclipse.org/jetty/documentation/current/configuring-ssl.html">Jetty documentation</a>.</td>
+      <td>none</td>
+    </tr>
+    <tr>
+      <td><code>ssl.exclude.protocols</code></td>
+      <td>Excludes a comma or pipe separated list of protocols to not accept for SSL or &ldquo;none&rdquo;</td>
+      <td><code>SSLv3</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.remote.config.monitor.client</code></td>
+      <td>A reference to the <a href="#Remote+Configuration+Registry+Clients">remote configuration registry client</a> the remote configuration monitor will employ</td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td><code>gateway.remote.config.monitor.client.allowUnauthenticatedReadAccess</code> </td>
+      <td>When a remote registry client is configured to access a registry securely, this property can be set to allow unauthenticated clients to continue to read the content from that registry by setting the ACLs accordingly. </td>
+      <td><code>false</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.remote.config.registry.&lt;name&gt;</code></td>
+      <td>A named <a href="#Remote+Configuration+Registry+Clients">remote configuration registry client</a> definition, where <em>name</em> is an arbitrary identifier for the connection</td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td><code>gateway.cluster.config.monitor.ambari.enabled</code></td>
+      <td>Indicates whether the cluster monitoring and associated dynamic topology updating is enabled </td>
+      <td><code>false</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.cluster.config.monitor.ambari.interval</code> </td>
+      <td>The interval (in seconds) at which the cluster monitor will poll Ambari for cluster configuration changes </td>
+      <td><code>60</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.remote.alias.service.enabled</code> </td>
+      <td>Turn on/off remote alias management, this will take effect only when remote configuration monitoring is enabled </td>
+      <td><code>true</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.read.only.override.topologies</code> </td>
+      <td>A comma-delimited list of topology names which should be forcibly treated as read-only. </td>
+      <td>none</td>
+    </tr>
+    <tr>
+      <td><code>gateway.discovery.default.address</code> </td>
+      <td>The default discovery address, which is applied if no address is specified in a descriptor. </td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td><code>gateway.discovery.default.cluster</code> </td>
+      <td>The default discovery cluster name, which is applied if no cluster name is specified in a descriptor. </td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td><code>gateway.dispatch.whitelist</code> </td>
+      <td>A semicolon-delimited list of regular expressions for controlling to which endpoints Knox dispatches and redirects will be permitted. If DEFAULT is specified, or the property is omitted entirely, then a default domain-based whitelist will be derived from the Knox host. An empty value means no dispatches will be permitted. </td>
+      <td>null</td>
+    </tr>
+    <tr>
+      <td><code>gateway.dispatch.whitelist.services</code> </td>
+      <td>A comma-delimited list of service roles to which the <em>gateway.dispatch.whitelist</em> will be applied. </td>
+      <td>none</td>
+    </tr>
+    <tr>
+      <td><code>gateway.strict.topology.validation</code> </td>
+      <td>If true, topology XML files will be validated against the topology schema during redeploy </td>
+      <td><code>false</code></td>
+    </tr>
+    <tr>
+      <td><code>gateway.global.rules.services</code> </td>
+      <td>Set the list of service names that have global rules, all services that are not in this list have rules that are treated as scoped to only to that service. </td>
+      <td><code>&quot;NAMENODE&quot;,&quot;JOBTRACKER&quot;, &quot;WEBHDFS&quot;, &quot;WEBHCAT&quot;, &quot;OOZIE&quot;, &quot;WEBHBASE&quot;, &quot;HIVE&quot;, &quot;RESOURCEMANAGER&quot;</code></td>
+    </tr>
+  </tbody>
+</table>
+<h4><a id="Topology+Descriptors">Topology Descriptors</a> <a href="#Topology+Descriptors"><img src="markbook-section-link.png"/></a></h4>
+<p>The topology descriptor files provide the gateway with per-cluster configuration information. This includes configuration for both the providers within the gateway and the services within the Hadoop cluster. These files are located in <code>{GATEWAY_HOME}/conf/topologies</code>. The general outline of this document looks like this.</p>
+<pre><code>&lt;topology&gt;
+    &lt;gateway&gt;
+        &lt;provider&gt;
+        &lt;/provider&gt;
+    &lt;/gateway&gt;
+    &lt;service&gt;
+    &lt;/service&gt;
+&lt;/topology&gt;
+</code></pre>
+<p>There are typically multiple <code>&lt;provider&gt;</code> and <code>&lt;service&gt;</code> elements.</p>
+<dl>
+  <dt>/topology</dt>
+  <dd>Defines the provider and configuration and service topology for a single Hadoop cluster.</dd>
+  <dt>/topology/gateway</dt>
+  <dd>Groups all of the provider elements</dd>
+  <dt>/topology/gateway/provider</dt>
+  <dd>Defines the configuration of a specific provider for the cluster.</dd>
+  <dt>/topology/service</dt>
+  <dd>Defines the location of a specific Hadoop service within the Hadoop cluster.</dd>
+</dl>
+<h5><a id="Provider+Configuration">Provider Configuration</a> <a href="#Provider+Configuration"><img src="markbook-section-link.png"/></a></h5>
+<p>Provider configuration is used to customize the behavior of a particular gateway feature. The general outline of a provider element looks like this.</p>
+<pre><code>&lt;provider&gt;
+    &lt;role&gt;authentication&lt;/role&gt;
+    &lt;name&gt;ShiroProvider&lt;/name&gt;
+    &lt;enabled&gt;true&lt;/enabled&gt;
+    &lt;param&gt;
+        &lt;name&gt;&lt;/name&gt;
+        &lt;value&gt;&lt;/value&gt;
+    &lt;/param&gt;
+&lt;/provider&gt;
+</code></pre>
+<dl>
+  <dt>/topology/gateway/provider</dt>
+  <dd>Groups information for a specific provider.</dd>
+  <dt>/topology/gateway/provider/role</dt>
+  <dd>Defines the role of a particular provider. There are a number of pre-defined roles used by out-of-the-box provider plugins for the gateway. These roles are: authentication, identity-assertion, rewrite and hostmap</dd>
+  <dt>/topology/gateway/provider/name</dt>
+  <dd>Defines the name of the provider for which this configuration applies. There can be multiple provider implementations for a given role. Specifying the name is used to identify which particular provider is being configured. Typically each topology descriptor should contain only one provider for each role but there are exceptions.</dd>
+  <dt>/topology/gateway/provider/enabled</dt>
+  <dd>Allows a particular provider to be enabled or disabled via <code>true</code> or <code>false</code> respectively. When a provider is disabled any filters associated with that provider are excluded from the processing chain.</dd>
+  <dt>/topology/gateway/provider/param</dt>
+  <dd>These elements are used to supply provider configuration. There can be zero or more of these per provider.</dd>
+  <dt>/topology/gateway/provider/param/name</dt>
+  <dd>The name of a parameter to pass to the provider.</dd>
+  <dt>/topology/gateway/provider/param/value</dt>
+  <dd>The value of a parameter to pass to the provider.</dd>
+</dl>
+<h5><a id="Service+Configuration">Service Configuration</a> <a href="#Service+Configuration"><img src="markbook-section-link.png"/></a></h5>
+<p>Service configuration is used to specify the location of services within the Hadoop cluster. The general outline of a service element looks like this.</p>
+<pre><code>&lt;service&gt;
+    &lt;role&gt;WEBHDFS&lt;/role&gt;
+    &lt;url&gt;http://localhost:50070/webhdfs&lt;/url&gt;
+&lt;/service&gt;
+</code></pre>
+<dl>
+  <dt>/topology/service</dt>
+  <dd>Provider information about a particular service within the Hadoop cluster. Not all services are necessarily exposed as gateway endpoints.</dd>
+  <dt>/topology/service/role</dt>
+  <dd>Identifies the role of this service. Currently supported roles are: WEBHDFS, WEBHCAT, WEBHBASE, OOZIE, HIVE, NAMENODE, JOBTRACKER, RESOURCEMANAGER Additional service roles can be supported via plugins. Note: The role names are case sensitive and must be upper case.</dd>
+  <dt>topology/service/url</dt>
+  <dd>The URL identifying the location of a particular service within the Hadoop cluster.</dd>
+</dl>
+<h4><a id="Hostmap+Provider">Hostmap Provider</a> <a href="#Hostmap+Provider"><img src="markbook-section-link.png"/></a></h4>
+<p>The purpose of the Hostmap provider is to handle situations where hosts are known by one name within the cluster and another name externally. This frequently occurs when virtual machines are used and in particular when using cloud hosting services. Currently, the Hostmap provider is configured as part of the topology file. The basic structure is shown below.</p>
+<pre><code>&lt;topology&gt;
+    &lt;gateway&gt;
+        ...
+        &lt;provider&gt;
+            &lt;role&gt;hostmap&lt;/role&gt;
+            &lt;name&gt;static&lt;/name&gt;
+            &lt;enabled&gt;true&lt;/enabled&gt;
+            &lt;param&gt;&lt;name&gt;external-host-name&lt;/name&gt;&lt;value&gt;internal-host-name&lt;/value&gt;&lt;/param&gt;
+        &lt;/provider&gt;
+        ...
+    &lt;/gateway&gt;
+    ...
+&lt;/topology&gt;
+</code></pre>
+<p>This mapping is required because the Hadoop services running within the cluster are unaware that they are being accessed from outside the cluster. Therefore URLs returned as part of REST API responses will typically contain internal host names. Since clients outside the cluster will be unable to resolve those host name they must be mapped to external host names.</p>
+<h5><a id="Hostmap+Provider+Example+-+EC2">Hostmap Provider Example - EC2</a> <a href="#Hostmap+Provider+Example+-+EC2"><img src="markbook-section-link.png"/></a></h5>
+<p>Consider an EC2 example where two VMs have been allocated. Each VM has an external host name by which it can be accessed via the internet. However the EC2 VM is unaware of this external host name and instead is configured with the internal host name.</p>
+<pre><code>External HOSTNAMES:
+ec2-23-22-31-165.compute-1.amazonaws.com
+ec2-23-23-25-10.compute-1.amazonaws.com
+
+Internal HOSTNAMES:
+ip-10-118-99-172.ec2.internal
+ip-10-39-107-209.ec2.internal
+</code></pre>
+<p>The Hostmap configuration required to allow access external to the Hadoop cluster via the Apache Knox Gateway would be this:</p>
+<pre><code>&lt;topology&gt;
+    &lt;gateway&gt;
+        ...
+        &lt;provider&gt;
+            &lt;role&gt;hostmap&lt;/role&gt;
+            &lt;name&gt;static&lt;/name&gt;
+            &lt;enabled&gt;true&lt;/enabled&gt;
+            &lt;param&gt;
+                &lt;name&gt;ec2-23-22-31-165.compute-1.amazonaws.com&lt;/name&gt;
+                &lt;value&gt;ip-10-118-99-172.ec2.internal&lt;/value&gt;
+            &lt;/param&gt;
+            &lt;param&gt;
+                &lt;name&gt;ec2-23-23-25-10.compute-1.amazonaws.com&lt;/name&gt;
+                &lt;value&gt;ip-10-39-107-209.ec2.internal&lt;/value&gt;
+            &lt;/param&gt;
+        &lt;/provider&gt;
+        ...
+    &lt;/gateway&gt;
+    ...
+&lt;/topology&gt;
+</code></pre>
+<h5><a id="Hostmap+Provider+Example+-+Sandbox">Hostmap Provider Example - Sandbox</a> <a href="#Hostmap+Provider+Example+-+Sandbox"><img src="markbook-section-link.png"/></a></h5>
+<p>The Hortonworks Sandbox 2.x poses a different challenge for host name mapping. This version of the Sandbox uses port mapping to make the Sandbox VM appear as though it is accessible via localhost. However the Sandbox VM is internally configured to consider sandbox.hortonworks.com as the host name. So from the perspective of a client accessing Sandbox the external host name is localhost. The Hostmap configuration required to allow access to Sandbox from the host operating system is this.</p>
+<pre><code>&lt;topology&gt;
+    &lt;gateway&gt;
+        ...
+        &lt;provider&gt;
+            &lt;role&gt;hostmap&lt;/role&gt;
+            &lt;name&gt;static&lt;/name&gt;
+            &lt;enabled&gt;true&lt;/enabled&gt;
+            &lt;param&gt;
+                &lt;name&gt;localhost&lt;/name&gt;
+                &lt;value&gt;sandbox,sandbox.hortonworks.com&lt;/value&gt;
+            &lt;/param&gt;
+        &lt;/provider&gt;
+        ...
+    &lt;/gateway&gt;
+    ...
+&lt;/topology&gt;
+</code></pre>
+<h5><a id="Hostmap+Provider+Configuration">Hostmap Provider Configuration</a> <a href="#Hostmap+Provider+Configuration"><img src="markbook-section-link.png"/></a></h5>
+<p>Details about each provider configuration element is enumerated below.</p>
+<dl>
+  <dt>topology/gateway/provider/role</dt>
+  <dd>The role for a Hostmap provider must always be <code>hostmap</code>.</dd>
+  <dt>topology/gateway/provider/name</dt>
+  <dd>The Hostmap provider supplied out-of-the-box is selected via the name <code>static</code>.</dd>
+  <dt>topology/gateway/provider/enabled</dt>
+  <dd>Host mapping can be enabled or disabled by providing <code>true</code> or <code>false</code>.</dd>
+  <dt>topology/gateway/provider/param</dt>
+  <dd>Host mapping is configured by providing parameters for each external to internal mapping.</dd>
+  <dt>topology/gateway/provider/param/name</dt>
+  <dd>The parameter names represent the external host names associated with the internal host names provided by the value element. This can be a comma separated list of host names that all represent the same physical host. When mapping from internal to external host name the first external host name in the list is used.</dd>
+  <dt>topology/gateway/provider/param/value</dt>
+  <dd>The parameter values represent the internal host names associated with the external host names provider by the name element. This can be a comma separated list of host names that all represent the same physical host. When mapping from external to internal host names the first internal host name in the list is used.</dd>
+</dl>
+<h4><a id="Simplified+Topology+Descriptors">Simplified Topology Descriptors</a> <a href="#Simplified+Topology+Descriptors"><img src="markbook-section-link.png"/></a></h4>
+<p>Simplified descriptors are a means to facilitate provider configuration sharing and service endpoint discovery. Rather than editing an XML topology descriptor, it&rsquo;s possible to create a simpler YAML (or JSON) descriptor specifying the desired contents of a topology, which will yield a full topology descriptor and deployment.</p>
+<h5><a id="Externalized+Provider+Configurations">Externalized Provider Configurations</a> <a href="#Externalized+Provider+Configurations"><img src="markbook-section-link.png"/></a></h5>
+<p>Sometimes, the same provider configuration is applied to multiple Knox topologies. With the provider configuration externalized from the simple descriptors, a single configuration can be referenced by multiple topologies. This helps reduce the duplication of configuration, and the need to update multiple configuration files when a policy change is required. Updating a provider configuration will trigger an update to all those topologies that reference it.</p>
+<p>The contents of externalized provider configuration details are identical to the contents of the gateway element from a full topology descriptor. The only difference is that those details are defined in a separate JSON/YAML file in <code>{GATEWAY_HOME}/conf/shared-providers/</code>, which is then referenced by one or more descriptors.</p>
+<p><em>Provider Configuration Example</em></p>
+<pre><code>{
+  &quot;providers&quot;: [
+    {
+      &quot;role&quot;: &quot;authentication&quot;,
+      &quot;name&quot;: &quot;ShiroProvider&quot;,
+      &quot;enabled&quot;: &quot;true&quot;,
+      &quot;params&quot;: {
+        &quot;sessionTimeout&quot;: &quot;30&quot;,
+        &quot;main.ldapRealm&quot;: &quot;org.apache.knox.gateway.shirorealm.KnoxLdapRealm&quot;,
+        &quot;main.ldapContextFactory&quot;: &quot;org.apache.knox.gateway.shirorealm.KnoxLdapContextFactory&quot;,
+        &quot;main.ldapRealm.contextFactory&quot;: &quot;$ldapContextFactory&quot;,
+        &quot;main.ldapRealm.userDnTemplate&quot;: &quot;uid={0},ou=people,dc=hadoop,dc=apache,dc=org&quot;,
+        &quot;main.ldapRealm.contextFactory.url&quot;: &quot;ldap://localhost:33389&quot;,
+        &quot;main.ldapRealm.contextFactory.authenticationMechanism&quot;: &quot;simple&quot;,
+        &quot;urls./**&quot;: &quot;authcBasic&quot;
+      }
+    },
+    {
+      &quot;name&quot;: &quot;static&quot;,
+      &quot;role&quot;: &quot;hostmape&quot;,
+      &quot;enabled&quot;: &quot;true&quot;,
+      &quot;params&quot;: {
+        &quot;localhost&quot;: &quot;sandbox,sandbox.hortonworks.com&quot;
+      }
+    }
+  ]
+}
+</code></pre>
+<h6><a id="Sharing+HA+Providers">Sharing HA Providers</a> <a href="#Sharing+HA+Providers"><img src="markbook-section-link.png"/></a></h6>
+<p>HA Providers are a special concern with respect to sharing provider configuration because they include service-specific (and possibly cluster-specific) configuration.</p>
+<p>This requires extra attention because the service configurations corresponding to the associated HA Provider configuration must contain the correct content to function properly.</p>
+<p>For a shared provider configuration with an HA Provider service:</p>
+<ul>
+  <li>If the referencing descriptor does not declare the corresponding service, then the HA Provider configuration is effectively ignored since the service isn&rsquo;t exposed by the topology.</li>
+  <li>If a corresponding service is declared in the descriptor
+    <ul>
+      <li>If service endpoint discovery is employed, then Knox should populate the URLs correctly to support the HA behavior.</li>
+      <li>Otherwise, the URLs must be explicitly specified for that service in the descriptor.</li>
+    </ul>
+  </li>
+  <li>If the descriptor content is correct, but the cluster service is not configured for HA, then the HA behavior obviously won&rsquo;t work.</li>
+</ul>
+<p><strong><em>Apache ZooKeeper-based HA Provider Services</em></strong></p>
+<p>The HA Provider configuration for some services (e.g., <a href="#HiveServer2+HA">HiveServer2</a>, <a href="#Kafka+HA">Kafka</a>) includes references to Apache ZooKeeper hosts (i.e., the ZooKeeper ensemble) and namespaces. It&rsquo;s important to understand the relationship of that ensemble configuration to the topologies referencing it. These ZooKeeper details are often cluster-specific. If the ZooKeeper ensemble in the provider configuration is part of cluster <em>A</em>, then it&rsquo;s probably incorrect to reference it in a topology for cluster <em>B</em> since the Hadoop service endpoints will probably be the wrong ones. However, if multiple clusters are working with a common ZooKeeper ensemble, then sharing this provider configuration <em>may</em> be appropriate.</p>
+<p><em>It&rsquo;s always best to specify cluster-specific details in a descriptor rather than a provider configuration.</em></p>
+<p>All of the service attributes, which can be specified in the HaProvider, can also be specified as params in the corresponding service declaration in the descriptor. If an attribute is specified in both the service declaration and the HaProvider, then the service-level value <strong>overrides</strong> the HaProvider-level value.</p>
+<pre><code>&quot;services&quot;: [
+  {
+    &quot;name&quot;: &quot;HIVE&quot;,
+    &quot;params&quot;: {
+      &quot;enabled&quot;: &quot;true&quot;,
+      &quot;zookeeperEnsemble&quot;: &quot;host1:2181,host2:2181,host3:2181&quot;,
+      &quot;zookeeperNamespace&quot; : &quot;hiveserver2&quot;,
+      &quot;maxRetryAttempts&quot; : &quot;100&quot;
+    }
+  }
+]
+</code></pre>
+<p>Note that Knox can dynamically determine these ZooKeeper ensemble details for <em>some</em> services; for others, they are static provider configuration details. The services for which Knox can discover the cluster-specific ZooKeeper details include:</p>
+<ul>
+  <li>YARN</li>
+  <li>HIVE</li>
+  <li>WEBHDFS</li>
+  <li>WEBHBASE</li>
+  <li>WEBHCAT</li>
+  <li>OOZIE</li>
+  <li>ATLAS</li>
+  <li>ATLAS-API</li>
+  <li>KAFKA</li>
+</ul>
+<p>For a subset of these supported services, Knox can also determine whether ZooKeeper-based HA is enabled or not. This means that the <em>enabled</em> attribute of the HA Provider configuration for these services may be set to <strong>auto</strong>, and Knox will determine whether or not it is enabled based on that service&rsquo;s configuration in the target cluster.</p>
+<pre><code>{
+  &quot;providers&quot;: [
+    {
+      &quot;role&quot;: &quot;ha&quot;,
+      &quot;name&quot;: &quot;HaProvider&quot;,
+      &quot;enabled&quot;: &quot;true&quot;,
+      &quot;params&quot;: {
+        &quot;WEBHDFS&quot;: &quot;maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=3;retrySleep=1000;enabled=true&quot;,
+        &quot;HIVE&quot;: &quot;maxFailoverAttempts=10;failoverSleep=1000;maxRetryAttempts=5;retrySleep=1000;enabled=auto&quot;,
+        &quot;YARN&quot;: &quot;maxFailoverAttempts=5;failoverSleep=5000;maxRetryAttempts=3;retrySleep=1000;enabled=auto&quot;
+      }
+    }
+  ]
+}
+</code></pre>
+<p>These services include:</p>
+<ul>
+  <li>YARN</li>
+  <li>HIVE</li>
+  <li>ATLAS</li>
+  <li>ATLAS-API</li>
+</ul>
+<p>Be sure to pay extra attention when sharing HA Provider configuration across topologies.</p>
+<h5><a id="Simplified+Descriptor+Files">Simplified Descriptor Files</a> <a href="#Simplified+Descriptor+Files"><img src="markbook-section-link.png"/></a></h5>
+<p>Simplified descriptors allow service URLs to be defined explicitly, just like full topology descriptors. However, if URLs are omitted for a service, Knox will attempt to discover that service&rsquo;s URLs from the Hadoop cluster. Currently, this behavior is only supported for clusters managed by Ambari. In any case, the simplified descriptors are much more concise than a full topology descriptor.</p>
+<p><em>Descriptor Properties</em></p>
+<table>
+  <thead>
+    <tr>
+      <th>Property </th>
+      <th>Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><code>discovery-type</code></td>
+      <td>The discovery source type. (Currently, the only supported type is <code>AMBARI</code>).</td>
+    </tr>
+    <tr>
+      <td><code>discovery-address</code></td>
+      <td>The endpoint address for the discovery source.</td>
+    </tr>
+    <tr>
+      <td><code>discovery-user</code></td>
+      <td>The username with permission to access the discovery source. If omitted, then Knox will check for an alias named <code>ambari.discovery.user</code>, and use its value if defined.</td>
+    </tr>
+    <tr>
+      <td><code>discovery-pwd-alias</code></td>
+      <td>The alias of the password for the user with permission to access the discovery source. If omitted, then Knox will check for an alias named <code>ambari.discovery.password</code>, and use its value if defined.</td>
+    </tr>
+    <tr>
+      <td><code>provider-config-ref</code></td>
+      <td>A reference to a provider configuration in <code>{GATEWAY_HOME}/conf/shared-providers/</code>.</td>
+    </tr>
+    <tr>
+      <td><code>cluster</code></td>
+      <td>The name of the cluster from which the topology service endpoints should be determined.</td>
+    </tr>
+    <tr>
+      <td><code>services</code></td>
+      <td>The collection of services to be included in the topology.</td>
+    </tr>
+    <tr>
+      <td><code>applications</code></td>
+      <td>The collection of applications to be included in the topology.</td>
+    </tr>
+  </tbody>
+</table>
+<p>Two file formats are supported for two distinct purposes.</p>
+<ul>
+  <li>YAML is intended for the individual hand-editing a simplified descriptor because of its readability.</li>
+  <li>JSON is intended to be used for <a href="#Admin+API">API</a> interactions.</li>
+</ul>
+<p>That being said, there is nothing preventing the hand-editing of files in the JSON format. However, the API will <em>not</em> accept YAML files as input.</p>
+<p><em>YAML Example</em> (based on the HDP Docker Sandbox)</p>
+<pre><code>---
+# Discovery source config
+discovery-type : AMBARI
+discovery-address : http://sandbox.hortonworks.com:8080
+
+# If this is not specified, the alias ambari.discovery.user is checked for a username
+discovery-user : maria_dev
+
+# If this is not specified, the default alias ambari.discovery.password is used
+discovery-pwd-alias : sandbox.discovery.password
+
+# Provider config reference, the contents of which will be included in the resulting topology descriptor
+provider-config-ref : sandbox-providers
+
+# The cluster for which the details should be discovered
+cluster: Sandbox
+
+# The services to declare in the resulting topology descriptor, whose URLs will be discovered (unless a value is specified)
+services:
+    - name: NAMENODE
+    - name: JOBTRACKER
+    - name: WEBHDFS
+    - name: WEBHCAT
+    - name: OOZIE
+    - name: WEBHBASE
+    - name: HIVE
+    - name: RESOURCEMANAGER
+    - name: KNOXSSO
+      params:
+          knoxsso.cookie.secure.only: true
+          knoxsso.token.ttl: 100000
+    - name: AMBARI
+      urls:
+          - http://sandbox.hortonworks.com:8080
+    - name: AMBARIUI
+      urls:
+          - http://sandbox.hortonworks.com:8080
+    - name: AMBARIWS
+      urls:
+          - ws://sandbox.hortonworks.com:8080
+</code></pre>
+<p><em>JSON Example</em> (based on the HDP Docker Sandbox)</p>
+<pre><code>{
+  &quot;discovery-type&quot;:&quot;AMBARI&quot;,
+  &quot;discovery-address&quot;:&quot;http://sandbox.hortonworks.com:8080&quot;,
+  &quot;discovery-user&quot;:&quot;maria_dev&quot;,
+  &quot;discovery-pwd-alias&quot;:&quot;sandbox.discovery.password&quot;,
+  &quot;provider-config-ref&quot;:&quot;sandbox-providers&quot;,
+  &quot;cluster&quot;:&quot;Sandbox&quot;,
+  &quot;services&quot;:[
+    {&quot;name&quot;:&quot;NAMENODE&quot;},
+    {&quot;name&quot;:&quot;JOBTRACKER&quot;},
+    {&quot;name&quot;:&quot;WEBHDFS&quot;},
+    {&quot;name&quot;:&quot;WEBHCAT&quot;},
+    {&quot;name&quot;:&quot;OOZIE&quot;},
+    {&quot;name&quot;:&quot;WEBHBASE&quot;},
+    {&quot;name&quot;:&quot;HIVE&quot;},
+    {&quot;name&quot;:&quot;RESOURCEMANAGER&quot;},
+    {&quot;name&quot;:&quot;KNOXSSO&quot;,
+      &quot;params&quot;:{
+      &quot;knoxsso.cookie.secure.only&quot;:&quot;true&quot;,
+      &quot;knoxsso.token.ttl&quot;:&quot;100000&quot;
+      }
+    },
+    {&quot;name&quot;:&quot;AMBARI&quot;, &quot;urls&quot;:[&quot;http://sandbox.hortonworks.com:8080&quot;]},
+    {&quot;name&quot;:&quot;AMBARIUI&quot;, &quot;urls&quot;:[&quot;http://sandbox.hortonworks.com:8080&quot;],
+    {&quot;name&quot;:&quot;AMBARIWS&quot;, &quot;urls&quot;:[&quot;ws://sandbox.hortonworks.com:8080&quot;]}
+  ]
+}
+</code></pre>
+<p>Both of these examples illustrate the specification of credentials for the interaction with Ambari. If no credentials are specified, then the default aliases are queried. Use of the default aliases is sufficient for scenarios where topology discovery will only interact with a single Ambari instance. For multiple Ambari instances however, it&rsquo;s most likely that each will require different sets of credentials. The discovery-user and discovery-pwd-alias properties exist for this purpose. Note that whether using the default credential aliases or specifying a custom password alias, these <a href="#Alias+creation">aliases must be defined</a> prior to any attempt to deploy a topology using a simplified descriptor.</p>
+<h5><a id="Deployment+Directories">Deployment Directories</a> <a href="#Deployment+Directories"><img src="markbook-section-link.png"/></a></h5>
+<p>Effecting topology changes is as simple as modifying files in two specific directories.</p>
+<p>The <code>{GATEWAY_HOME}/conf/shared-providers/</code> directory is the location where Knox looks for provider configurations. This directory is monitored for changes, such that modifying a provider configuration file therein will trigger updates to any referencing simplified descriptors in the <code>{GATEWAY_HOME}/conf/descriptors/</code> directory. <em>Care should be taken when deleting these files if there are referencing descriptors; any subsequent modifications of referencing descriptors will fail when the deleted provider configuration cannot be found. The references should all be modified before deleting the provider configuration.</em></p>
+<p>Likewise, the <code>{GATEWAY_HOME}/conf/descriptors/</code> directory is monitored for changes, such that adding or modifying a simplified descriptor file in this directory will trigger the generation and deployment of a topology descriptor. Deleting a descriptor from this directory will conversely result in the removal of the previously-generated topology descriptor, and the associated topology will be undeployed.</p>
+<p>If the service details for a deployed (generated) topology are changed in the cluster, then the Knox topology can be updated by &rsquo;touch&rsquo;ing the simplified descriptor. This will trigger discovery and regeneration/redeployment of the topology descriptor.</p>
+<p>Note that deleting a generated topology descriptor from <code>{GATEWAY_HOME}/conf/topologies/</code> is not sufficient for its removal. If the source descriptor is modified, or Knox is restarted, the topology descriptor will be regenerated and deployed. Removing generated topology descriptors should be done by removing the associated simplified descriptor. For the same reason, editing generated topology descriptors is strongly discouraged since they can be inadvertently overwritten.</p>
+<p>Another means by which these topology changes can be effected is the <a href="#Admin+API">Admin API</a>.</p>
+<h5><a id="Cloud+Federation+Configuration">Cloud Federation Configuration</a> <a href="#Cloud+Federation+Configuration"><img src="markbook-section-link.png"/></a></h5>
+<p>Cloud Federation feature allows for a topology based federation from one Knox instance to another (from on-prem Knox instance to cloud knox instance).</p>
+<h5><a id="Cluster+Configuration+Monitoring">Cluster Configuration Monitoring</a> <a href="#Cluster+Configuration+Monitoring"><img src="markbook-section-link.png"/></a></h5>
+<p>Another benefit gained through the use of simplified topology descriptors, and the associated service discovery, is the ability to monitor clusters for configuration changes. <strong>Like service discovery, this is currently only available for clusters managed by Ambari.</strong></p>
+<p>The gateway can monitor Ambari cluster configurations, and respond to changes by dynamically regenerating and redeploying the affected topologies. The following properties in gateway-site.xml can be used to control this behavior.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;gateway.cluster.config.monitor.ambari.enabled&lt;/name&gt;
+    &lt;value&gt;false&lt;/value&gt;
+    &lt;description&gt;Enable/disable Ambari cluster configuration monitoring.&lt;/description&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;gateway.cluster.config.monitor.ambari.interval&lt;/name&gt;
+    &lt;value&gt;60&lt;/value&gt;
+    &lt;description&gt;The interval (in seconds) for polling Ambari for cluster configuration changes.&lt;/description&gt;
+&lt;/property&gt;
+</code></pre>
+<p>Since service discovery supports multiple Ambari instances as discovery sources, multiple Ambari instances can be monitored for cluster configuration changes.</p>
+<p>For example, if the cluster monitor is enabled, deployment of the following simple descriptor would trigger monitoring of the <em>Sandbox</em> cluster managed by Ambari @ <a href="http://sandbox.hortonworks.com:8080">http://sandbox.hortonworks.com:8080</a></p>
+<pre><code>---
+discovery-address : http://sandbox.hortonworks.com:8080
+discovery-user : maria_dev
+discovery-pwd-alias : sandbox.discovery.password
+cluster: Sandbox
+provider-config-ref : sandbox-providers
+services:
+    - name: NAMENODE
+    - name: JOBTRACKER
+    - name: WEBHDFS
+    - name: WEBHCAT
+    - name: OOZIE
+    - name: WEBHBASE
+    - name: HIVE
+    - name: RESOURCEMANAGER
+</code></pre>
+<p>Another <em>Sandbox</em> cluster, managed by a <strong>different</strong> Ambari instance, could simultaneously be monitored by the same gateway instance.</p>
+<p>Now, topologies can be kept in sync with their respective target cluster configurations, without administrator intervention or service interruption.</p>
+<h5><a id="Remote+Configuration+Monitor">Remote Configuration Monitor</a> <a href="#Remote+Configuration+Monitor"><img src="markbook-section-link.png"/></a></h5>
+<p>In addition to monitoring local directories for provider configurations and simplified descriptors, the gateway similarly supports monitoring ZooKeeper.</p>
+<p>This monitor depends on a <a href="#Remote+Configuration+Registry+Clients">remote configuration registry client</a>, and that client must be specified by setting the following property in gateway-site.xml</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;gateway.remote.config.monitor.client&lt;/name&gt;
+    &lt;value&gt;sandbox-zookeeper-client&lt;/value&gt;

[... 7731 lines stripped ...]


Mime
View raw message