knox-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From m...@apache.org
Subject svn commit: r1850181 [13/13] - in /knox: site/books/knox-1-3-0/ site/books/knox-1-3-0/adminui/ trunk/books/1.3.0/ trunk/books/1.3.0/dev-guide/ trunk/books/1.3.0/img/ trunk/books/1.3.0/img/adminui/
Date Wed, 02 Jan 2019 17:31:31 GMT
Added: knox/trunk/books/1.3.0/service_solr.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/1.3.0/service_solr.md?rev=1850181&view=auto
==============================================================================
--- knox/trunk/books/1.3.0/service_solr.md (added)
+++ knox/trunk/books/1.3.0/service_solr.md Wed Jan  2 17:31:29 2019
@@ -0,0 +1,119 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Solr ###
+
+Knox provides gateway functionality to Solr with support for versions 5.5+ and 6+. The Solr REST APIs allow the user to view the status 
+of the collections, perform administrative actions and query collections.
+
+See the Solr Quickstart (http://lucene.apache.org/solr/quickstart.html) section of the Solr documentation for examples of the Solr REST API.
+
+Since Knox provides an abstraction over Solr and ZooKeeper, the use of the SolrJ CloudSolrClient is no longer supported.  You should replace 
+instances of CloudSolrClient with HttpSolrClient.
+
+<p>Note: Updates to Solr via Knox require a POST operation require the use of preemptive authentication which is not directly supported by the 
+SolrJ API at this time.</p>  
+
+To enable this functionality, a topology file needs to have the following configuration:
+
+    <service>
+        <role>SOLR</role>
+        <version>6.0.0</version>
+        <url>http://<solr-host>:<solr-port></url>
+    </service>
+
+The default Solr port is 8983. Adjust the version specified to either '5.5.0 or '6.0.0'.
+
+For Solr 5.5.0 you also need to change the role name to `SOLRAPI` like this:
+
+    <service>
+        <role>SOLRAPI</role>
+        <version>5.5.0</version>
+        <url>http://<solr-host>:<solr-port></url>
+    </service>
+
+
+#### Solr URL Mapping ####
+
+For Solr URLs, the mapping of Knox Gateway accessible URLs to direct Solr URLs is the following.
+
+| ------- | ------------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/solr` |
+| Cluster | `http://{solr-host}:{solr-port}/solr`                               |
+
+
+#### Solr Examples via cURL
+
+Some of the various calls that can be made and examples using curl are listed below.
+
+    # 0. Query collection
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/solr/select?q=*:*&wt=json'
+
+    # 1. Query cluster status
+    
+    curl -ikv -u guest:guest-password -X POST 'https://localhost:8443/gateway/sandbox/solr/admin/collections?action=CLUSTERSTATUS' 
+
+### Solr HA ###
+
+Knox provides basic failover functionality for calls made to Solr Cloud when more than one Solr instance is
+installed in the cluster and registered with the same ZooKeeper ensemble. The HA functionality in this case fetches the
+Solr URL information from a ZooKeeper ensemble, so the user need only supply the necessary ZooKeeper
+configuration and not the Solr connection URLs.
+
+To enable HA functionality for Solr Cloud in Knox the following configuration has to be added to the topology file.
+
+    <provider>
+        <role>ha</role>
+        <name>HaProvider</name>
+        <enabled>true</enabled>
+        <param>
+            <name>SOLR</name>
+            <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181</value>
+       </param>
+    </provider>
+
+The role and name of the provider above must be as shown. The name in the 'param' section must match that of the service
+role name that is being configured for HA and the value in the 'param' section is the configuration for that particular
+service in HA mode. In this case the name is 'SOLR'.
+
+The various configuration parameters are described below:
+
+* maxFailoverAttempts -
+This is the maximum number of times a failover will be attempted. The failover strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and the one that failed is put at the bottom
+of the list. If the list is exhausted and the maximum number of attempts is not reached then the first URL will be tried
+again after the list is fetched again from ZooKeeper (a refresh of the list is done at this point)
+
+* failoverSleep -
+The amount of time in millis that the process will wait or sleep before attempting to failover.
+
+* enabled -
+Flag to turn the particular service on or off for HA.
+
+* zookeeperEnsemble -
+A comma separated list of host names (or IP addresses) of the zookeeper hosts that consist of the ensemble that the Solr
+servers register their information with. 
+
+And for the service configuration itself the URLs need NOT be added to the list. For example.
+
+    <service>
+        <role>SOLR</role>
+        <version>6.0.0</version>
+    </service>
+
+Please note that there is no `<url>` tag specified here as the URLs for the Solr servers are obtained from ZooKeeper.

Added: knox/trunk/books/1.3.0/service_storm.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/1.3.0/service_storm.md?rev=1850181&view=auto
==============================================================================
--- knox/trunk/books/1.3.0/service_storm.md (added)
+++ knox/trunk/books/1.3.0/service_storm.md Wed Jan  2 17:31:29 2019
@@ -0,0 +1,112 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Storm ###
+
+Storm is a distributed realtime computation system. Storm exposes REST APIs for UI functionality that can be used for
+retrieving metrics data and configuration information as well as management operations such as starting or stopping topologies.
+
+The docs for this can be found here
+
+https://github.com/apache/storm/blob/master/docs/STORM-UI-REST-API.md
+
+To enable this functionality, a topology file needs to have the following configuration:
+
+    <service>
+        <role>STORM</role>
+        <url>http://<hostname>:<port></url>
+    </service>
+
+The default UI daemon port is 8744. If it is configured to some other port, that configuration can be
+found in `storm.yaml` as the value for the property `ui.port`.
+
+In addition to the storm service configuration above, a STORM-LOGVIEWER service must be configured if the
+log files are to be retrieved through Knox. The value of the port for the logviewer can be found by the property
+`logviewer.port` also in the file `storm.yaml`.
+
+    <service>
+        <role>STORM-LOGVIEWER</role>
+        <url>http://<hostname>:<port></url>
+    </service>
+
+
+#### Storm URL Mapping ####
+
+For Storm URLs, the mapping of Knox Gateway accessible URLs to direct Storm URLs is the following.
+
+| ------- | ------------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/storm` |
+| Cluster | `http://{storm-host}:{storm-port}`                                      |
+
+For the log viewer the mapping is as follows
+
+| ------- | ------------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/storm/logviewer` |
+| Cluster | `http://{storm-logviewer-host}:{storm-logviewer-port}`                                      |
+
+
+#### Storm Examples
+
+Some of the various calls that can be made and examples using curl are listed below.
+
+    # 0. Getting cluster configuration
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/cluster/configuration'
+    
+    # 1. Getting cluster summary information
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/cluster/summary'
+
+    # 2. Getting supervisor summary information
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/supervisor/summary'
+    
+    # 3. topologies summary information
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/topology/summary'
+    
+    # 4. Getting specific topology information. Substitute {id} with the topology id.
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/topology/{id}'
+
+    # 5. To get component level information. Substitute {id} with the topology id and {component} with the component id e.g. 'spout'
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/storm/api/v1/topology/{id}/component/{component}'
+
+
+The following POST operations all require a 'x-csrf-token' header along with other information that can be stored in a cookie file.
+In particular the 'ring-session' header and 'JSESSIONID'.
+
+    # 6. To activate a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value.
+
+    curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \
+     http://localhost:8744/api/v1/topology/{id}/activate
+
+    # 7. To de-activate a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value.
+
+    curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \
+     http://localhost:8744/api/v1/topology/{id}/deactivate
+
+    # 8. To rebalance a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value.
+
+    curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \
+     http://localhost:8744/api/v1/topology/{id}/rebalance/0
+
+    # 9. To kill a topology. Substitute {id} with the topology id and {token-value} with the x-csrf-token value.
+
+    curl -ik -b ~/cookiejar.txt -c ~/cookiejar.txt -u guest:guest-password -H 'x-csrf-token:{token-value}' -X POST \
+     http://localhost:8744/api/v1/topology/{id}/kill/0

Added: knox/trunk/books/1.3.0/service_webhcat.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/1.3.0/service_webhcat.md?rev=1850181&view=auto
==============================================================================
--- knox/trunk/books/1.3.0/service_webhcat.md (added)
+++ knox/trunk/books/1.3.0/service_webhcat.md Wed Jan  2 17:31:29 2019
@@ -0,0 +1,181 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### WebHCat ###
+
+WebHCat (also called _Templeton_) is a related but separate service from HiveServer2.
+As such it is installed and configured independently.
+The [WebHCat wiki pages](https://cwiki.apache.org/confluence/display/Hive/WebHCat) describe this processes.
+In sandbox this configuration file for WebHCat is located at `/etc/hadoop/hcatalog/webhcat-site.xml`.
+Note the properties shown below as they are related to configuration required by the gateway.
+
+    <property>
+        <name>templeton.port</name>
+        <value>50111</value>
+    </property>
+
+Also important is the configuration of the JOBTRACKER RPC endpoint.
+For Hadoop 2 this can be found in the `yarn-site.xml` file.
+In Sandbox this file can be found at `/etc/hadoop/conf/yarn-site.xml`.
+The property `yarn.resourcemanager.address` within that file is relevant for the gateway's configuration.
+
+    <property>
+        <name>yarn.resourcemanager.address</name>
+        <value>sandbox.hortonworks.com:8050</value>
+    </property>
+
+See #[WebHDFS] for details about locating the Hadoop configuration for the NAMENODE endpoint.
+
+The gateway by default includes a sample topology descriptor file `{GATEWAY_HOME}/deployments/sandbox.xml`.
+The values in this sample are configured to work with an installed Sandbox VM.
+
+    <service>
+        <role>NAMENODE</role>
+        <url>hdfs://localhost:8020</url>
+    </service>
+    <service>
+        <role>JOBTRACKER</role>
+        <url>rpc://localhost:8050</url>
+    </service>
+    <service>
+        <role>WEBHCAT</role>
+        <url>http://localhost:50111/templeton</url>
+    </service>
+
+The URLs provided for the role NAMENODE and JOBTRACKER do not result in an endpoint being exposed by the gateway.
+This information is only required so that other URLs can be rewritten that reference the appropriate RPC address for Hadoop services.
+This prevents clients from needing to be aware of the internal cluster details.
+Note that for Hadoop 2 the JOBTRACKER RPC endpoint is provided by the Resource Manager component.
+
+By default the gateway is configured to use the HTTP endpoint for WebHCat in the Sandbox.
+This could alternatively be configured to use the HTTPS endpoint by providing the correct address.
+
+#### WebHCat URL Mapping ####
+
+For WebHCat URLs, the mapping of Knox Gateway accessible URLs to direct WebHCat URLs is simple.
+
+| ------- | ------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/templeton` |
+| Cluster | `http://{webhcat-host}:{webhcat-port}/templeton}`                               |
+
+
+#### WebHCat via cURL
+
+Users can use cURL to directly invoke the REST APIs via the gateway. For the full list of available REST calls look at the WebHCat documentation. This is a simple curl command to test the connection:
+
+    curl -i -k -u guest:guest-password 'https://localhost:8443/gateway/sandbox/templeton/v1/status'
+
+
+#### WebHCat Example ####
+
+This example will submit the familiar WordCount Java MapReduce job to the Hadoop cluster via the gateway using the KnoxShell DSL.
+There are several ways to do this depending upon your preference.
+
+You can use the "embedded" Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar samples/ExampleWebHCatJob.groovy
+
+You can manually type in the KnoxShell DSL script into the "embedded" Groovy interpreter provided with the distribution.
+
+    java -jar bin/shell.jar
+
+Each line from the file `samples/ExampleWebHCatJob.groovy` would then need to be typed or copied into the interactive shell.
+
+#### WebHCat Client DSL ####
+
+##### submitJava() - Submit a Java MapReduce job.
+
+* Request
+    * jar (String) - The remote file name of the JAR containing the app to execute.
+    * app (String) - The app name to execute. This is _wordcount_ for example not the class name.
+    * input (String) - The remote directory name to use as input for the job.
+    * output (String) - The remote directory name to store output from the job.
+* Response
+    * jobId : String - The job ID of the submitted job.  Consumes body.
+* Example
+
+
+    Job.submitJava(session)
+        .jar(remoteJarName)
+        .app(appName)
+        .input(remoteInputDir)
+        .output(remoteOutputDir)
+        .now()
+        .jobId
+
+##### submitPig() - Submit a Pig job.
+
+* Request
+    * file (String) - The remote file name of the pig script.
+    * arg (String) - An argument to pass to the script.
+    * statusDir (String) - The remote directory to store status output.
+* Response
+    * jobId : String - The job ID of the submitted job.  Consumes body.
+* Example
+    * `Job.submitPig(session).file(remotePigFileName).arg("-v").statusDir(remoteStatusDir).now()`
+
+##### submitHive() - Submit a Hive job.
+
+* Request
+    * file (String) - The remote file name of the hive script.
+    * arg (String) - An argument to pass to the script.
+    * statusDir (String) - The remote directory to store status output.
+* Response
+    * jobId : String - The job ID of the submitted job.  Consumes body.
+* Example
+    * `Job.submitHive(session).file(remoteHiveFileName).arg("-v").statusDir(remoteStatusDir).now()`
+
+#### submitSqoop Job API ####
+Using the Knox DSL, you can now easily submit and monitor [Apache Sqoop](https://sqoop.apache.org) jobs. The WebHCat Job class now supports the `submitSqoop` command.
+
+    Job.submitSqoop(session)
+        .command("import --connect jdbc:mysql://hostname:3306/dbname ... ")
+        .statusDir(remoteStatusDir)
+        .now().jobId
+
+The `submitSqoop` command supports the following arguments:
+
+* command (String) - The sqoop command string to execute.
+* files (String) - Comma separated files to be copied to the templeton controller job.
+* optionsfile (String) - The remote file which contain Sqoop command need to run.
+* libdir (String) - The remote directory containing jdbc jar to include with sqoop lib
+* statusDir (String) - The remote directory to store status output.
+
+A complete example is available here: https://cwiki.apache.org/confluence/display/KNOX/2016/11/08/Running+SQOOP+job+via+KNOX+Shell+DSL
+
+
+##### queryQueue() - Return a list of all job IDs registered to the user.
+
+* Request
+    * No request parameters.
+* Response
+    * BasicResponse
+* Example
+    * `Job.queryQueue(session).now().string`
+
+##### queryStatus() - Check the status of a job and get related job information given its job ID.
+
+* Request
+    * jobId (String) - The job ID to check. This is the ID received when the job was created.
+* Response
+    * BasicResponse
+* Example
+    * `Job.queryStatus(session).jobId(jobId).now().string`
+
+### WebHCat HA ###
+
+Please look at #[Default Service HA support]

Added: knox/trunk/books/1.3.0/service_webhdfs.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/1.3.0/service_webhdfs.md?rev=1850181&view=auto
==============================================================================
--- knox/trunk/books/1.3.0/service_webhdfs.md (added)
+++ knox/trunk/books/1.3.0/service_webhdfs.md Wed Jan  2 17:31:29 2019
@@ -0,0 +1,346 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+### WebHDFS ###
+
+REST API access to HDFS in a Hadoop cluster is provided by WebHDFS or HttpFS.
+Both services provide the same API.
+The [WebHDFS REST API](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html) documentation is available online.
+WebHDFS must be enabled in the `hdfs-site.xml` configuration file and exposes the API on each NameNode and DataNode.
+HttpFS however is a separate server to be configured and started separately.
+In the sandbox this configuration file is located at `/etc/hadoop/conf/hdfs-site.xml`.
+Note the properties shown below as they are related to configuration required by the gateway.
+Some of these represent the default values and may not actually be present in `hdfs-site.xml`.
+
+    <property>
+        <name>dfs.webhdfs.enabled</name>
+        <value>true</value>
+    </property>
+    <property>
+        <name>dfs.namenode.rpc-address</name>
+        <value>sandbox.hortonworks.com:8020</value>
+    </property>
+    <property>
+        <name>dfs.namenode.http-address</name>
+        <value>sandbox.hortonworks.com:50070</value>
+    </property>
+    <property>
+        <name>dfs.https.namenode.https-address</name>
+        <value>sandbox.hortonworks.com:50470</value>
+    </property>
+
+The values above need to be reflected in each topology descriptor file deployed to the gateway.
+The gateway by default includes a sample topology descriptor file `{GATEWAY_HOME}/deployments/sandbox.xml`.
+The values in this sample are configured to work with an installed Sandbox VM.
+
+Please also note that the port changed from 50070 to 9870 in Hadoop 3.0.
+
+    <service>
+        <role>NAMENODE</role>
+        <url>hdfs://localhost:8020</url>
+    </service>
+    <service>
+        <role>WEBHDFS</role>
+        <url>http://localhost:50070/webhdfs</url>
+    </service>
+
+The URL provided for the role NAMENODE does not result in an endpoint being exposed by the gateway.
+This information is only required so that other URLs can be rewritten that reference the Name Node's RPC address.
+This prevents clients from needing to be aware of the internal cluster details.
+
+By default the gateway is configured to use the HTTP endpoint for WebHDFS in the Sandbox.
+This could alternatively be configured to use the HTTPS endpoint by providing the correct address.
+
+##### HDFS NameNode Federation
+
+NameNode federation introduces some additional complexity when determining to which URL(s) Knox should proxy HDFS-related requests.
+
+The HDFS core-site.xml configuration includes additional properties, which represent options in terms of the NameNode endpoints.
+
+| ------- | ---------------------------------------------------- | ---------------------- |
+| Property Name             | Description                        | Example Value          |
+| dfs.internal.nameservices | The list of defined namespaces     | ns1,ns2                |
+
+For each value enumerated by *dfs.internal.nameservices*, there is another property defined, for specifying the associated NameNode names.
+
+| ------- | ------------------------------------------------------------------ | ---------- |
+| Property Name        | Description                                           | Example Value |
+| dfs.ha.namenodes.ns1 | The NameNode identifiers associated with the ns1 namespace  | nn1,nn2 |
+| dfs.ha.namenodes.ns2 | The NameNode identifiers associated with the ns2 namespace  | nn3,nn4 |
+
+For each namenode name enumerated by each of these properties, there are other properties defined, for specifying the associated host addresses.
+
+| ------- | ---------------------------------------------------- | ---------------------- |
+| Property Name             | Description                        | Example Value          |
+| dfs.namenode.http-address.ns1.nn1  | The HTTP host address of nn1 NameNode in the ns1 namespace  | host1:50070 |
+| dfs.namenode.https-address.ns1.nn1 | The HTTPS host address of nn1 NameNode in the ns1 namespace | host1:50470 |
+| dfs.namenode.http-address.ns1.nn2  | The HTTP host address of nn2 NameNode in the ns1 namespace  | host2:50070 |
+| dfs.namenode.https-address.ns1.nn2 | The HTTPS host address of nn2 NameNode in the ns1 namespace | host2:50470 |
+| dfs.namenode.http-address.ns2.nn3  | The HTTP host address of nn3 NameNode in the ns2 namespace  | host3:50070 |
+| dfs.namenode.https-address.ns2.nn3 | The HTTPS host address of nn3 NameNode in the ns2 namespace | host3:50470 |
+| dfs.namenode.http-address.ns2.nn4  | The HTTP host address of nn4 NameNode in the ns2 namespace  | host4:50070 |
+| dfs.namenode.https-address.ns2.nn4 | The HTTPS host address of nn4 NameNode in the ns2 namespace | host4:50470 |
+
+So, if Knox should proxy the NameNodes associated with *ns1*, and the configuration does not dictate HTTPS, then the WEBHDFS service must
+contain URLs based on the values of *dfs.namenode.http-address.ns1.nn1* and *dfs.namenode.http-address.ns1.nn2*. Likewise, if Knox should
+proxy the NameNodes associated with *ns2*, the WEBHDFS service must contain URLs based on the values of *dfs.namenode.http-address.ns2.nn3*
+and *dfs.namenode.http-address.ns2.nn3*.
+
+Fortunately, for Ambari-managed clusters, [descriptors](#Simplified+Descriptor+Files) and service discovery can handle this complexity for administrators.
+In the descriptor, the service can be declared without any endpoints, and the desired namespace can be specified to disambiguate which endpoint(s)
+should be proxied by way of a parameter named *discovery-namespace*.
+
+    "services": [
+      {
+        "name": "WEBHDFS",
+        "params": {
+          "discovery-nameservice": "ns2"
+        }
+      },
+
+If no namespace is specified, then the default namespace will be applied. This default namespace is derived from the value of the
+property named *fs.defaultFS* defined in the HDFS *core-site.xml* configuration.
+
+<br>
+
+#### WebHDFS URL Mapping ####
+
+For Name Node URLs, the mapping of Knox Gateway accessible WebHDFS URLs to direct WebHDFS URLs is simple.
+
+| ------- | ----------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs` |
+| Cluster | `http://{webhdfs-host}:50070/webhdfs`                                         |
+
+However, there is a subtle difference to URLs that are returned by WebHDFS in the Location header of many requests.
+Direct WebHDFS requests may return Location headers that contain the address of a particular DataNode.
+The gateway will rewrite these URLs to ensure subsequent requests come back through the gateway and internal cluster details are protected.
+
+A WebHDFS request to the NameNode to retrieve a file will return a URL of the form below in the Location header.
+
+    http://{datanode-host}:{data-node-port}/webhdfs/v1/{path}?...
+
+Note that this URL contains the network location of a DataNode.
+The gateway will rewrite this URL to look like the URL below.
+
+    https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs/data/v1/{path}?_={encrypted-query-parameters}
+
+The `{encrypted-query-parameters}` will contain the `{datanode-host}` and `{datanode-port}` information.
+This information along with the original query parameters are encrypted so that the internal Hadoop details are protected.
+
+#### WebHDFS Examples ####
+
+The examples below upload a file, download the file and list the contents of the directory.
+
+##### WebHDFS via client DSL
+
+You can use the Groovy example scripts and interpreter provided with the distribution.
+
+    java -jar bin/shell.jar samples/ExampleWebHdfsPutGet.groovy
+    java -jar bin/shell.jar samples/ExampleWebHdfsLs.groovy
+
+You can manually type the client DSL script into the KnoxShell interactive Groovy interpreter provided with the distribution.
+The command below starts the KnoxShell in interactive mode.
+
+    java -jar bin/shell.jar
+
+Each line below could be typed or copied into the interactive shell and executed.
+This is provided as an example to illustrate the use of the client DSL.
+
+    // Import the client DSL and a useful utilities for working with JSON.
+    import org.apache.knox.gateway.shell.Hadoop
+    import org.apache.knox.gateway.shell.hdfs.Hdfs
+    import groovy.json.JsonSlurper
+
+    // Setup some basic config.
+    gateway = "https://localhost:8443/gateway/sandbox"
+    username = "guest"
+    password = "guest-password"
+
+    // Start the session.
+    session = Hadoop.login( gateway, username, password )
+
+    // Cleanup anything leftover from a previous run.
+    Hdfs.rm( session ).file( "/user/guest/example" ).recursive().now()
+
+    // Upload the README to HDFS.
+    Hdfs.put( session ).file( "README" ).to( "/user/guest/example/README" ).now()
+
+    // Download the README from HDFS.
+    text = Hdfs.get( session ).from( "/user/guest/example/README" ).now().string
+    println text
+
+    // List the contents of the directory.
+    text = Hdfs.ls( session ).dir( "/user/guest/example" ).now().string
+    json = (new JsonSlurper()).parseText( text )
+    println json.FileStatuses.FileStatus.pathSuffix
+
+    // Cleanup the directory.
+    Hdfs.rm( session ).file( "/user/guest/example" ).recursive().now()
+
+    // Clean the session.
+    session.shutdown()
+
+
+##### WebHDFS via cURL
+
+Users can use cURL to directly invoke the REST APIs via the gateway.
+
+###### Optionally cleanup the sample directory in case a previous example was run without cleaning up.
+
+    curl -i -k -u guest:guest-password -X DELETE \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&recursive=true'
+
+###### Register the name for a sample file README in /user/guest/example.
+
+    curl -i -k -u guest:guest-password -X PUT \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/README?op=CREATE'
+
+###### Upload README to /user/guest/example.  Use the README in {GATEWAY_HOME}.
+
+    curl -i -k -u guest:guest-password -T README -X PUT \
+        '{Value of Location header from command above}'
+
+###### List the contents of the directory /user/guest/example.
+
+    curl -i -k -u guest:guest-password -X GET \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=LISTSTATUS'
+
+###### Request the content of the README file in /user/guest/example.
+
+    curl -i -k -u guest:guest-password -X GET \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/README?op=OPEN'
+
+###### Read the content of the file.
+
+    curl -i -k -u guest:guest-password -X GET \
+        '{Value of Location header from command above}'
+
+###### Optionally cleanup the example directory.
+
+    curl -i -k -u guest:guest-password -X DELETE \
+        'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&recursive=true'
+
+
+##### WebHDFS client DSL
+
+###### get() - Get a file from HDFS (OPEN).
+
+* Request
+    * from( String name ) - The full name of the file in HDFS.
+    * file( String name ) - The name of a local file to create with the content.
+    If this isn't specified the file content must be read from the response.
+* Response
+    * BasicResponse
+    * If file parameter specified content will be streamed to file.
+* Example
+    * `Hdfs.get( session ).from( "/user/guest/example/README" ).now().string`
+
+###### ls() - Query the contents of a directory (LISTSTATUS)
+
+* Request
+    * dir( String name ) - The full name of the directory in HDFS.
+* Response
+    * BasicResponse
+* Example
+    * `Hdfs.ls( session ).dir( "/user/guest/example" ).now().string`
+
+###### mkdir() - Create a directory in HDFS (MKDIRS)
+
+* Request
+    * dir( String name ) - The full name of the directory to create in HDFS.
+    * perm( String perm ) - The permissions for the directory (e.g. 644).  Optional: default="777"
+* Response
+    * EmptyResponse - Implicit close().
+* Example
+    * `Hdfs.mkdir( session ).dir( "/user/guest/example" ).now()`
+
+###### put() - Write a file into HDFS (CREATE)
+
+* Request
+    * text( String text ) - Text to upload to HDFS.  Takes precedence over file if both present.
+    * file( String name ) - The name of a local file to upload to HDFS.
+    * to( String name ) - The fully qualified name to create in HDFS.
+* Response
+    * EmptyResponse - Implicit close().
+* Example
+    * `Hdfs.put( session ).file( README ).to( "/user/guest/example/README" ).now()`
+
+###### rm() - Delete a file or directory (DELETE)
+
+* Request
+    * file( String name ) - The fully qualified file or directory name in HDFS.
+    * recursive( Boolean recursive ) - Delete directory and all of its contents if True.  Optional: default=False
+* Response
+    * BasicResponse - Implicit close().
+* Example
+    * `Hdfs.rm( session ).file( "/user/guest/example" ).recursive().now()`
+
+
+### WebHDFS HA ###
+
+Knox provides basic failover and retry functionality for REST API calls made to WebHDFS when HDFS HA has been 
+configured and enabled.
+
+To enable HA functionality for WebHDFS in Knox the following configuration has to be added to the topology file.
+
+    <provider>
+       <role>ha</role>
+       <name>HaProvider</name>
+       <enabled>true</enabled>
+       <param>
+           <name>WEBHDFS</name>
+           <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
+       </param>
+    </provider>
+    
+The role and name of the provider above must be as shown. The name in the 'param' section must match that of the service 
+role name that is being configured for HA and the value in the 'param' section is the configuration for that particular
+service in HA mode. In this case the name is 'WEBHDFS'.
+
+The various configuration parameters are described below:
+     
+* maxFailoverAttempts - 
+This is the maximum number of times a failover will be attempted. The failover strategy at this time is very simplistic
+in that the next URL in the list of URLs provided for the service is used and the one that failed is put at the bottom 
+of the list. If the list is exhausted and the maximum number of attempts is not reached then the first URL that failed 
+will be tried again (the list will start again from the original top entry).
+
+* failoverSleep - 
+The amount of time in milliseconds that the process will wait or sleep before attempting to failover.
+
+* maxRetryAttempts - 
+The is the maximum number of times that a retry request will be attempted. Unlike failover, the retry is done on the 
+same URL that failed. This is a special case in HDFS when the node is in safe mode. The expectation is that the node will
+come out of safe mode so a retry is desirable here as opposed to a failover.
+
+* retrySleep - 
+The amount of time in milliseconds that the process will wait or sleep before a retry is issued.
+
+* enabled - 
+Flag to turn the particular service on or off for HA.
+
+And for the service configuration itself the additional URLs should be added to the list. The active 
+URL (at the time of configuration) should ideally be added to the top of the list.
+
+
+    <service>
+        <role>WEBHDFS</role>
+        <url>http://{host1}:50070/webhdfs</url>
+        <url>http://{host2}:50070/webhdfs</url>
+    </service>
+
+

Added: knox/trunk/books/1.3.0/service_yarn.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/1.3.0/service_yarn.md?rev=1850181&view=auto
==============================================================================
--- knox/trunk/books/1.3.0/service_yarn.md (added)
+++ knox/trunk/books/1.3.0/service_yarn.md Wed Jan  2 17:31:29 2019
@@ -0,0 +1,124 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### Yarn ###
+
+Knox provides gateway functionality for the REST APIs of the ResourceManager. The ResourceManager REST APIs allow the
+user to get information about the cluster - status on the cluster, metrics on the cluster, scheduler information,
+information about nodes in the cluster, and information about applications on the cluster. Also as of Hadoop version
+2.5.0, the user can submit a new application as well as kill it (or get state) using the 'Writable' APIs.
+
+The docs for this can be found here
+
+http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
+
+To enable this functionality, a topology file needs to have the following configuration:
+
+    <service>
+        <role>RESOURCEMANAGER</role>
+        <url>http://<hostname>:<port>/ws</url>
+    </service>
+
+The default resource manager http port is 8088. If it is configured to some other port, that configuration can be
+found in `yarn-site.xml` under the property `yarn.resourcemanager.webapp.address`.
+
+#### Yarn URL Mapping ####
+
+For Yarn URLs, the mapping of Knox Gateway accessible URLs to direct Yarn URLs is the following.
+
+| ------- | ------------------------------------------------------------------------------------- |
+| Gateway | `https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/resourcemanager` |
+| Cluster | `http://{yarn-host}:{yarn-port}/ws}`                                      |
+
+
+#### Yarn Examples via cURL
+
+Some of the various calls that can be made and examples using curl are listed below.
+
+    # 0. Getting cluster info
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster'
+    
+    # 1. Getting cluster metrics
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/metrics'
+    
+    To get the same information in an xml format
+    
+    curl -ikv -u guest:guest-password -H Accept:application/xml -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/metrics'
+    
+    # 2. Getting scheduler information
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/scheduler'
+    
+    # 3. Getting all the applications listed and their information
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps'
+    
+    # 4. Getting applications statistics
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/appstatistics'
+    
+    Also query params can be used as below to filter the results
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/appstatistics?states=accepted,running,finished&applicationTypes=mapreduce'
+    
+    # 5. To get a specific application (please note, replace the application id with a real value)
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/{application_id}'
+    
+    # 6. To get the attempts made for a particular application
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/{application_id}/appattempts'
+    
+    # 7. To get information about the various nodes
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/nodes'
+    
+    Also to get a specific node, use an id obtained in the response from above (the node id is scrambled) and issue the following
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/nodes/{node_id}'
+    
+    # 8. To create a new Application
+    
+    curl -ikv -u guest:guest-password -X POST 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/new-application'
+    
+    An application id is returned from the request above and this can be used to submit an application.
+    
+    # 9. To submit an application, put together a request containing the application id received in the above response (please refer to Yarn REST
+    API documentation).
+    
+    curl -ikv -u guest:guest-password -T request.json -H Content-Type:application/json -X POST 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps'
+    
+    Here the request is saved in a file called request.json
+    
+    #10. To get application state
+    
+    curl -ikv -u guest:guest-password -X GET 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/{application_id}/state'
+    
+    curl -ikv -u guest:guest-password -H Content-Type:application/json -X PUT -T state-killed.json 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/application_1409008107556_0007/state'
+    
+    # 11. To kill an application that is running issue the below command with the application id of the application that is to be killed.
+    The contents of the state-killed.json file are :
+    
+    {
+      "state":"KILLED"
+    }
+    
+    
+    curl -ikv -u guest:guest-password -H Content-Type:application/json -X PUT -T state-killed.json 'https://localhost:8443/gateway/sandbox/resourcemanager/v1/cluster/apps/{application_id}/state'
+

Added: knox/trunk/books/1.3.0/websocket-support.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/1.3.0/websocket-support.md?rev=1850181&view=auto
==============================================================================
--- knox/trunk/books/1.3.0/websocket-support.md (added)
+++ knox/trunk/books/1.3.0/websocket-support.md Wed Jan  2 17:31:29 2019
@@ -0,0 +1,76 @@
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+## WebSocket Support ##
+
+### Introduction
+
+WebSocket is a communication protocol that allows full duplex communication over a single TCP connection.
+Knox provides out-of-the-box support for the WebSocket protocol, currently only text messages are supported.
+
+### Configuration ###
+
+By default WebSocket functionality is disabled, it can be easily enabled by changing the `gateway.websocket.feature.enabled` property to `true` in `<KNOX-HOME>/conf/gateway-site.xml` file.  
+
+      <property>
+          <name>gateway.websocket.feature.enabled</name>
+          <value>true</value>
+          <description>Enable/Disable websocket feature.</description>
+      </property>
+
+Service and rewrite rules need to changed accordingly to match the appropriate websocket context.
+
+### Example ###
+
+In the following sample configuration we assume that the backend WebSocket URL is ws://myhost:9999/ws. And 'gateway.websocket.feature.enabled' property is set to 'true' as shown above.
+
+#### rewrite ####
+
+Example code snippet from `<KNOX-HOME>/data/services/{myservice}/{version}/rewrite.xml` where myservice = websocket and version = 0.6.0
+
+      <rules>
+        <rule dir="IN" name="WEBSOCKET/ws/inbound" pattern="*://*:*/**/ws">
+          <rewrite template="{$serviceUrl[WEBSOCKET]}/ws"/>
+        </rule>
+      </rules>
+
+#### service ####
+
+Example code snippet from `<KNOX-HOME>/data/services/{myservice}/{version}/service.xml` where myservice = websocket and version = 0.6.0
+
+      <service role="WEBSOCKET" name="websocket" version="0.6.0">
+        <policies>
+              <policy role="webappsec"/>
+              <policy role="authentication" name="Anonymous"/>
+              <policy role="rewrite"/>
+              <policy role="authorization"/>
+        </policies>
+        <routes>
+          <route path="/ws">
+              <rewrite apply="WEBSOCKET/ws/inbound" to="request.url"/>
+          </route>
+        </routes>
+      </service>
+
+#### topology ####
+
+Finally, update the topology file at `<KNOX-HOME>/conf/{topology}.xml`  with the backend service URL
+
+      <service>
+          <role>WEBSOCKET</role>
+          <url>ws://myhost:9999/ws</url>
+      </service>

Added: knox/trunk/books/1.3.0/x-forwarded-headers.md
URL: http://svn.apache.org/viewvc/knox/trunk/books/1.3.0/x-forwarded-headers.md?rev=1850181&view=auto
==============================================================================
--- knox/trunk/books/1.3.0/x-forwarded-headers.md (added)
+++ knox/trunk/books/1.3.0/x-forwarded-headers.md Wed Jan  2 17:31:29 2019
@@ -0,0 +1,76 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+--->
+
+### X-Forwarded-* Headers Support ###
+Out-of-the-box Knox provides support for some `X-Forwarded-*` headers through the use of a Servlet Filter. Specifically the
+headers handled/populated by Knox are:
+
+* X-Forwarded-For
+* X-Forwarded-Proto
+* X-Forwarded-Port
+* X-Forwarded-Host
+* X-Forwarded-Server
+* X-Forwarded-Context
+
+This functionality can be turned off by a configuration setting in the file gateway-site.xml and redeploying the
+necessary topology/topologies.
+
+The setting is (under the 'configuration' tag) :
+
+    <property>
+        <name>gateway.xforwarded.enabled</name>
+        <value>false</value>
+    </property>
+
+If this setting is absent, the default behavior is that the `X-Forwarded-*` header support is on or in other words,
+`gateway.xforwarded.enabled` is set to `true` by default.
+
+
+#### Header population ####
+
+The following are the various rules for population of these headers:
+
+##### X-Forwarded-For #####
+
+This header represents a list of client IP addresses. If the header is already present Knox adds a comma separated value
+to the list. The value added is the client's IP address as Knox sees it. This value is added to the end of the list.
+
+##### X-Forwarded-Proto #####
+
+The protocol used in the client request. If this header is passed into Knox its value is maintained, otherwise Knox will
+populate the header with the value 'https' if the request is a secure one or 'http' otherwise.
+
+##### X-Forwarded-Port #####
+
+The port used in the client request. If this header is passed into Knox its value is maintained, otherwise Knox will
+populate the header with the value of the port that the request was made coming into Knox.
+
+##### X-Forwarded-Host #####
+
+Represents the original host requested by the client in the Host HTTP request header. The value passed into Knox is maintained
+by Knox. If no value is present, Knox populates the header with the value of the HTTP Host header.
+
+##### X-Forwarded-Server #####
+
+The hostname of the server Knox is running on.
+
+##### X-Forwarded-Context #####
+
+This header value contains the context path of the request to Knox.
+
+
+



Mime
View raw message