knox-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kmin...@apache.org
Subject svn commit: r1625685 - in /knox: site/books/knox-0-4-0/ site/books/knox-0-5-0/ trunk/books/0.5.0/
Date Wed, 17 Sep 2014 17:06:34 GMT
Author: kminder
Date: Wed Sep 17 17:06:34 2014
New Revision: 1625685

URL: http://svn.apache.org/r1625685
Log:
Updates for HDFS HA support.

Modified:
    knox/site/books/knox-0-4-0/deployment-overview.png
    knox/site/books/knox-0-4-0/deployment-provider.png
    knox/site/books/knox-0-4-0/deployment-service.png
    knox/site/books/knox-0-4-0/runtime-overview.png
    knox/site/books/knox-0-4-0/runtime-request-processing.png
    knox/site/books/knox-0-5-0/knox-0-5-0.html
    knox/trunk/books/0.5.0/service_webhdfs.md

Modified: knox/site/books/knox-0-4-0/deployment-overview.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/deployment-overview.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/deployment-provider.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/deployment-provider.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/deployment-service.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/deployment-service.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/runtime-overview.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/runtime-overview.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-4-0/runtime-request-processing.png
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-4-0/runtime-request-processing.png?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
Binary files - no diff available.

Modified: knox/site/books/knox-0-5-0/knox-0-5-0.html
URL: http://svn.apache.org/viewvc/knox/site/books/knox-0-5-0/knox-0-5-0.html?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
--- knox/site/books/knox-0-5-0/knox-0-5-0.html (original)
+++ knox/site/books/knox-0-5-0/knox-0-5-0.html Wed Sep 17 17:06:34 2014
@@ -1625,7 +1625,7 @@ dep/commons-codec-1.7.jar
   </tbody>
 </table><p>However, there is a subtle difference to URLs that are returned by
WebHDFS in the Location header of many requests. Direct WebHDFS requests may return Location
headers that contain the address of a particular Data Node. The gateway will rewrite these
URLs to ensure subsequent requests come back through the gateway and internal cluster details
are protected.</p><p>A WebHDFS request to the Node Node to retrieve a file will
return a URL of the form below in the Location header.</p>
 <pre><code>http://{datanode-host}:{data-node-port}/webhdfs/v1/{path}?...
-</code></pre><p>Note that this URL contains the newtwork location of a
Data Node. The gateway will rewrite this URL to look like the URL below.</p>
+</code></pre><p>Note that this URL contains the network location of a Data
Node. The gateway will rewrite this URL to look like the URL below.</p>
 <pre><code>https://{gateway-host}:{gateway-port}/{gateway-path}/{custer-name}/webhdfs/data/v1/{path}?_={encrypted-query-parameters}
 </code></pre><p>The <code>{encrypted-query-parameters}</code>
will contain the <code>{datanode-host}</code> and <code>{datanode-port}</code>
information. This information along with the original query parameters are encrypted so that
the internal Hadoop details are protected.</p><h4><a id="WebHDFS+Examples"></a>WebHDFS
Examples</h4><p>The examples below upload a file, download the file and list the
contents of the directory.</p><h5><a id="WebHDFS+via+client+DSL"></a>WebHDFS
via client DSL</h5><p>You can use the Groovy example scripts and interpreter provided
with the distribution.</p>
 <pre><code>java -jar bin/shell.jar samples/ExampleWebHdfsPutGet.groovy
@@ -1763,7 +1763,30 @@ session.shutdown()
   <ul>
     <li><code>Hdfs.rm( session ).file( &quot;/user/guest/example&quot;
).recursive().now()</code></li>
   </ul></li>
-</ul><h3><a id="WebHCat"></a>WebHCat</h3><p>WebHCat is
a related but separate service from Hive. As such it is installed and configured independently.
The <a href="https://cwiki.apache.org/confluence/display/Hive/WebHCat">WebHCat wiki
pages</a> describe this processes. In sandbox this configuration file for WebHCat is
located at /etc/hadoop/hcatalog/webhcat-site.xml. Note the properties shown below as they
are related to configuration required by the gateway.</p>
+</ul><h3><a id="WebHDFS+HA"></a>WebHDFS HA</h3><p>Knox
provides basic failover and retry functionality for REST API calls made to WebHDFS when HDFS
HA has been configured and enabled.</p><p>To enable HA functionality for WebHDFS
in Knox the following configuration has to be added to the topology file.</p>
+<pre><code>&lt;provider&gt;
+   &lt;role&gt;ha&lt;/role&gt;
+   &lt;name&gt;HaProvider&lt;/name&gt;
+   &lt;enabled&gt;true&lt;/enabled&gt;
+   &lt;param&gt;
+       &lt;name&gt;WEBHDFS&lt;/name&gt;
+       &lt;value&gt;maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true&lt;/value&gt;
+   &lt;/param&gt;
+&lt;/provider&gt;
+</code></pre><p>The role and name of the provider above must be as shown.
The name in the &lsquo;param&rsquo; section must match that of the service role name
that is being configured for HA and the value in the &lsquo;param&rsquo; section is
the configuration for that particular service in HA mode. In this case the name is &lsquo;WEBHDFS&rsquo;.</p><p>The
various configuration parameters are described below:</p>
+<ul>
+  <li><p>maxFailoverAttempts - This is the maximum number of times a failover
will be attempted. The failover strategy at this time is very simplistic in that the next
URL in the list of URLs provided for the service is used and the one that failed is put at
the bottom of the list. If the list is exhausted and the maximum number of attempts is not
reached then the first URL that failed will be tried again (the list will start again from
the original top entry).</p></li>
+  <li><p>failoverSleep - The amount of time in millis that the process will wait
or sleep before attempting to failover.</p></li>
+  <li><p>maxRetryAttempts - The is the maximum number of times that a retry request
will be attempted. Unlike failover, the retry is done on the same URL that failed. This is
a special case in HDFS when the node is in safe mode. The expectation is that the node will
come out of safe mode so a retry is desirable here as opposed to a failover.</p></li>
+  <li><p>retrySleep - The amount of time in millis that the process will wait
or sleep before a retry is issued.</p></li>
+  <li><p>enabled - Flag to turn the particular service on or off for HA.</p></li>
+</ul><p>And for the service configuration itself the additional URLs that standby
nodes should be added to the list. The active URL (at the time of configuration) should ideally
be added to the top of the list.</p>
+<pre><code>&lt;service&gt;
+    &lt;role&gt;WEBHDFS&lt;/role&gt;
+    &lt;url&gt;http://{host1}:50070/webhdfs&lt;/url&gt;
+    &lt;url&gt;http://{host2}:50070/webhdfs&lt;/url&gt;
+&lt;/service&gt;
+</code></pre><h3><a id="WebHCat"></a>WebHCat</h3><p>WebHCat
is a related but separate service from Hive. As such it is installed and configured independently.
The <a href="https://cwiki.apache.org/confluence/display/Hive/WebHCat">WebHCat wiki
pages</a> describe this processes. In sandbox this configuration file for WebHCat is
located at /etc/hadoop/hcatalog/webhcat-site.xml. Note the properties shown below as they
are related to configuration required by the gateway.</p>
 <pre><code>&lt;property&gt;
     &lt;name&gt;templeton.port&lt;/name&gt;
     &lt;value&gt;50111&lt;/value&gt;

Modified: knox/trunk/books/0.5.0/service_webhdfs.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/0.5.0/service_webhdfs.md?rev=1625685&r1=1625684&r2=1625685&view=diff
==============================================================================
--- knox/trunk/books/0.5.0/service_webhdfs.md (original)
+++ knox/trunk/books/0.5.0/service_webhdfs.md Wed Sep 17 17:06:34 2014
@@ -77,7 +77,7 @@ A WebHDFS request to the Node Node to re
 
     http://{datanode-host}:{data-node-port}/webhdfs/v1/{path}?...
 
-Note that this URL contains the newtwork location of a Data Node.
+Note that this URL contains the network location of a Data Node.
 The gateway will rewrite this URL to look like the URL below.
 
     https://{gateway-host}:{gateway-port}/{gateway-path}/{custer-name}/webhdfs/data/v1/{path}?_={encrypted-query-parameters}
@@ -234,6 +234,61 @@ Use can use cURL to directly invoke the 
     * `Hdfs.rm( session ).file( "/user/guest/example" ).recursive().now()`
 
 
+### WebHDFS HA ###
+
+Knox provides basic failover and retry functionality for REST API calls made to WebHDFS when
HDFS HA has been 
+configured and enabled.
+
+To enable HA functionality for WebHDFS in Knox the following configuration has to be added
to the topology file.
+
+    <provider>
+       <role>ha</role>
+       <name>HaProvider</name>
+       <enabled>true</enabled>
+       <param>
+           <name>WEBHDFS</name>
+           <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
+       </param>
+    </provider>
+    
+The role and name of the provider above must be as shown. The name in the 'param' section
must match that of the service 
+role name that is being configured for HA and the value in the 'param' section is the configuration
for that particular
+service in HA mode. In this case the name is 'WEBHDFS'.
+
+The various configuration parameters are described below:
+     
+* maxFailoverAttempts - 
+This is the maximum number of times a failover will be attempted. The failover strategy at
this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and the one that
failed is put at the bottom 
+of the list. If the list is exhausted and the maximum number of attempts is not reached then
the first URL that failed 
+will be tried again (the list will start again from the original top entry).
+
+* failoverSleep - 
+The amount of time in millis that the process will wait or sleep before attempting to failover.
+
+* maxRetryAttempts - 
+The is the maximum number of times that a retry request will be attempted. Unlike failover,
the retry is done on the 
+same URL that failed. This is a special case in HDFS when the node is in safe mode. The expectation
is that the node will
+come out of safe mode so a retry is desirable here as opposed to a failover.
+
+* retrySleep - 
+The amount of time in millis that the process will wait or sleep before a retry is issued.
+
+* enabled - 
+Flag to turn the particular service on or off for HA.
+
+And for the service configuration itself the additional URLs that standby nodes should be
added to the list. The active 
+URL (at the time of configuration) should ideally be added to the top of the list.
+
+
+    <service>
+        <role>WEBHDFS</role>
+        <url>http://{host1}:50070/webhdfs</url>
+        <url>http://{host2}:50070/webhdfs</url>
+    </service>
+    
+
+
 
 
 



Mime
View raw message