knox-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kmin...@apache.org
Subject svn commit: r1530346 [2/3] - in /incubator/knox: site/ site/books/knox-incubating-0-3-0/ trunk/books/0.3.0/ trunk/books/static/
Date Tue, 08 Oct 2013 16:45:25 GMT

Modified: incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html
URL: http://svn.apache.org/viewvc/incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html?rev=1530346&r1=1530345&r2=1530346&view=diff
==============================================================================
--- incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html (original)
+++ incubator/knox/site/books/knox-incubating-0-3-0/knox-incubating-0-3-0.html Tue Oct  8 16:45:25 2013
@@ -352,8 +352,14 @@ ip-10-39-107-209.ec2.internal
             <role>hostmap</role>
             <name>static</name>
             <enabled>true</enabled>
-            <param><name>ec2-23-22-31-165.compute-1.amazonaws.com</name><value>ip-10-118-99-172.ec2.internal</value></param>
-            <param><name>ec2-23-23-25-10.compute-1.amazonaws.com</name><value>ip-10-39-107-209.ec2.internal</value></param>
+            <param>
+                <name>ec2-23-22-31-165.compute-1.amazonaws.com</name>
+                <value>ip-10-118-99-172.ec2.internal</value>
+            </param>
+            <param>
+                <name>ec2-23-23-25-10.compute-1.amazonaws.com</name>
+                <value>ip-10-39-107-209.ec2.internal</value>
+            </param>
         </provider>
         ...
     </gateway>
@@ -408,9 +414,11 @@ ip-10-39-107-209.ec2.internal
   <li>the name of the expected identity keystore for the gateway MUST be gateway.jks</li>
   <li>the passwords for the keystore and the imported key MUST both be the master secret for the gateway install</li>
 </ol><p>NOTE: The password for the keystore as well as that of the imported key must be the master secret for the gateway instance.</p><h1><a id="----END+NEEDS+TESTING"></a>&mdash;-END NEEDS TESTING</h1><h5><a id="Generating+a+self-signed+cert+for+use+in+testing+or+development+environments"></a>Generating a self-signed cert for use in testing or development environments</h5>
-<pre><code>keytool -genkey -keyalg RSA -alias gateway-identity -keystore gateway.jks -storepass {master-secret} -validity 360 -keysize 2048 
-</code></pre><p>Keytool will prompt you for a number of elements used that will comprise this distiniguished name (DN) within your certificate. </p><p><b>NOTE:</b> When it prompts you for your First and Last name be sure to type in the hostname of the machine that your gateway instance will be running on. This is used by clients during hostname verification to ensure that the presented certificate matches the hostname that was used in the URL for the connection - so they need to match.</p><p><b>NOTE:</b> When it prompts for the key password just press enter to ensure that it is the same as the keystore password. Which as was described earlier must match the master secret for the gateway instance.</p><h5><a id="Credential+Store"></a>Credential Store</h5><p>Whenever you provide your own keystore with either a self-signed cert or a real certificate signed by a trusted authority, you will need to create an empty credential store. This is necessary for the current release in order for th
 e system to utilize the same password for the keystore and the key.</p><p>The credential stores in Knox use the JCEKS keystore type as it allows for the storage of general secrets in addition to certificates.</p>
-<pre><code>keytool -genkey -alias {anything} -keystore __gateway-credentials.jceks -storepass {master-secret} -validity 360 -keysize 1024 -storetype JCEKS
+<pre><code>keytool -genkey -keyalg RSA -alias gateway-identity -keystore gateway.jks \
+    -storepass {master-secret} -validity 360 -keysize 2048
+</code></pre><p>Keytool will prompt you for a number of elements used that will comprise this distiniguished name (DN) within your certificate. </p><p><em>NOTE:</em> When it prompts you for your First and Last name be sure to type in the hostname of the machine that your gateway instance will be running on. This is used by clients during hostname verification to ensure that the presented certificate matches the hostname that was used in the URL for the connection - so they need to match.</p><p><em>NOTE:</em> When it prompts for the key password just press enter to ensure that it is the same as the keystore password. Which as was described earlier must match the master secret for the gateway instance.</p><h5><a id="Credential+Store"></a>Credential Store</h5><p>Whenever you provide your own keystore with either a self-signed cert or a real certificate signed by a trusted authority, you will need to create an empty credential store. This is necessary for the current release in order fo
 r the system to utilize the same password for the keystore and the key.</p><p>The credential stores in Knox use the JCEKS keystore type as it allows for the storage of general secrets in addition to certificates.</p>
+<pre><code>keytool -genkey -alias {anything} -keystore __gateway-credentials.jceks \
+    -storepass {master-secret} -validity 360 -keysize 1024 -storetype JCEKS
 </code></pre><p>Follow the prompts again for the DN for the cert of the credential store. This certificate isn&rsquo;t really used for anything at the moment but is required to create the credential store.</p><h5><a id="Provisioning+of+Keystores"></a>Provisioning of Keystores</h5><p>Once you have created these keystores you must move them into place for the gateway to discover them and use them to represent its identity for SSL connections. This is done by copying the keystores to the <code>{GATEWAY_HOME}/conf/security/keystores</code> directory for your gateway install.</p><h4><a id="Summary+of+Secrets+to+be+Managed"></a>Summary of Secrets to be Managed</h4>
 <ol>
   <li>Master secret - the same for all gateway instances in a cluster of gateways</li>
@@ -480,18 +488,19 @@ ldapRealm.userDnTemplate=uid={0},ou=peop
 </ol><h4><a id="Session+Configuration"></a>Session Configuration</h4><p>Knox maps each cluster topology to a web application and leverages standard JavaEE session management.</p><p>To configure session idle timeout for the topology, please specify value of parameter sessionTimeout for ShiroProvider in your topology file. If you do not specify the value for this parameter, it defaults to 30minutes.</p><p>The definition would look like the following in the topoloogy file:</p>
 <pre><code>...
 &lt;provider&gt;
-            &lt;role&gt;authentication&lt;/role&gt;
-            &lt;name&gt;ShiroProvider&lt;/name&gt;
-            &lt;enabled&gt;true&lt;/enabled&gt;
-            &lt;param&gt;
-                &lt;!-- 
-                session timeout in minutes,  this is really idle timeout,
-                defaults to 30mins, if the property value is not defined,, 
-                current client authentication would expire if client idles contiuosly for more than this value
-                --&gt;
-                &lt;name&gt;sessionTimeout&lt;/name&gt;
-                &lt;value&gt;30&lt;/value&gt;
-            &lt;/param&gt;
+    &lt;role&gt;authentication&lt;/role&gt;
+    &lt;name&gt;ShiroProvider&lt;/name&gt;
+    &lt;enabled&gt;true&lt;/enabled&gt;
+    &lt;param&gt;
+        &lt;!--
+        Session timeout in minutes. This is really idle timeout.
+        Defaults to 30 minutes, if the property value is not defined.
+        Current client authentication will expire if client idles
+        continuously for more than this value
+        --&gt;
+        &lt;name&gt;sessionTimeout&lt;/name&gt;
+        &lt;value&gt;30&lt;/value&gt;
+    &lt;/param&gt;
 &lt;provider&gt;
 ...
 </code></pre><p>At present, ShiroProvider in Knox leverages JavaEE session to maintain authentication state for a user across requests using JSESSIONID cookie. So, a clieent that authenticated with Knox could pass the JSESSIONID cookie with repeated requests as long as the session has not timed out instead of submitting userid/password with every request. Presenting a valid session cookie in place of userid/password would also perform better as additional credential store lookups are avoided.</p><h3><a id="Identity+Assertion"></a>Identity Assertion</h3><p>The identity assertion provider within Knox plays the critical role of communicating the identity principal to be used within the Hadoop cluster to represent the identity that has been authenticated at the gateway.</p><p>The general responsibilities of the identity assertion provider is to interrogate the current Java Subject that has been established by the authentication or federation provider and:</p>
@@ -836,39 +845,39 @@ chmod 400 knox.service.keytab
   <li>The Apache Knox Gateway is installed and functional.</li>
   <li>The example commands are executed within the context of the GATEWAY_HOME current directory. The GATEWAY_HOME directory is the directory within the Apache Knox Gateway installation that contains the README file and the bin, conf and deployments directories.</li>
   <li>A few examples require the use of commands from a standard Groovy installation. These examples are optional but to try them you will need Groovy <a href="http://groovy.codehaus.org/Installing+Groovy">installed</a>.</li>
-</ul><h3><a id="Assumptions"></a>Assumptions</h3><p>The DSL requires a shell to interpret the Groovy script. The shell can either be used interactively or to execute a script file. To simplify use, the distribution contains an embedded version of the Groovy shell.</p><p>The shell can be run interactively. Use the command <code>exit</code> to exit.</p>
+</ul><h3><a id="Basics"></a>Basics</h3><p>The DSL requires a shell to interpret the Groovy script. The shell can either be used interactively or to execute a script file. To simplify use, the distribution contains an embedded version of the Groovy shell.</p><p>The shell can be run interactively. Use the command <code>exit</code> to exit.</p>
 <pre><code>java -jar bin/shell.jar
 </code></pre><p>When running interactively it may be helpful to reduce some of the output generated by the shell console. Use the following command in the interactive shell to reduce that output. This only needs to be done once as these preferences are persisted.</p>
 <pre><code>set verbosity QUIET
 set show-last-result false
 </code></pre><p>Also when running interactively use the <code>exit</code> command to terminate the shell. Using <code>^C</code> to exit can sometimes leaves the parent shell in a problematic state.</p><p>The shell can also be used to execute a script by passing a single filename argument.</p>
-<pre><code>java -jar bin/shell.jar samples/ExamplePutFile.groovy
+<pre><code>java -jar bin/shell.jar samples/ExampleWebHdfsPutGetFile.groovy
 </code></pre><h3><a id="Examples"></a>Examples</h3><p>Once the shell can be launched the DSL can be used to interact with the gateway and Hadoop. Below is a very simple example of an interactive shell session to upload a file to HDFS.</p>
 <pre><code>java -jar bin/shell.jar
-knox:000&gt; hadoop = Hadoop.login( &quot;https://localhost:8443/gateway/sandbox&quot;, &quot;guest&quot;, &quot;guest-password&quot; )
-knox:000&gt; Hdfs.put( hadoop ).file( &quot;README&quot; ).to( &quot;/tmp/example/README&quot; ).now()
+knox:000&gt; session = Hadoop.login( &quot;https://localhost:8443/gateway/sandbox&quot;, &quot;guest&quot;, &quot;guest-password&quot; )
+knox:000&gt; Hdfs.put( session ).file( &quot;README&quot; ).to( &quot;/tmp/example/README&quot; ).now()
 </code></pre><p>The <code>knox:000&gt;</code> in the example above is the prompt from the embedded Groovy console. If you output doesn&rsquo;t look like this you may need to set the verbosity and show-last-result preferences as described above in the Usage section.</p><p>If you relieve an error <code>HTTP/1.1 403 Forbidden</code> it may be because that file already exists. Try deleting it with the following command and then try again.</p>
-<pre><code>knox:000&gt; Hdfs.rm(hadoop).file(&quot;/tmp/example/README&quot;).now()
+<pre><code>knox:000&gt; Hdfs.rm(session).file(&quot;/tmp/example/README&quot;).now()
 </code></pre><p>Without using some other tool to browse HDFS it is hard to tell that that this command did anything. Execute this to get a bit more feedback.</p>
-<pre><code>knox:000&gt; println &quot;Status=&quot; + Hdfs.put( hadoop ).file( &quot;README&quot; ).to( &quot;/tmp/example/README2&quot; ).now().statusCode
+<pre><code>knox:000&gt; println &quot;Status=&quot; + Hdfs.put( session ).file( &quot;README&quot; ).to( &quot;/tmp/example/README2&quot; ).now().statusCode
 Status=201
 </code></pre><p>Notice that a different filename is used for the destination. Without this an error would have resulted. Of course the DSL also provides a command to list the contents of a directory.</p>
-<pre><code>knox:000&gt; println Hdfs.ls( hadoop ).dir( &quot;/tmp/example&quot; ).now().string
+<pre><code>knox:000&gt; println Hdfs.ls( session ).dir( &quot;/tmp/example&quot; ).now().string
 {&quot;FileStatuses&quot;:{&quot;FileStatus&quot;:[{&quot;accessTime&quot;:1363711366977,&quot;blockSize&quot;:134217728,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:19395,&quot;modificationTime&quot;:1363711366977,&quot;owner&quot;:&quot;guest&quot;,&quot;pathSuffix&quot;:&quot;README&quot;,&quot;permission&quot;:&quot;644&quot;,&quot;replication&quot;:1,&quot;type&quot;:&quot;FILE&quot;},{&quot;accessTime&quot;:1363711375617,&quot;blockSize&quot;:134217728,&quot;group&quot;:&quot;hdfs&quot;,&quot;length&quot;:19395,&quot;modificationTime&quot;:1363711375617,&quot;owner&quot;:&quot;guest&quot;,&quot;pathSuffix&quot;:&quot;README2&quot;,&quot;permission&quot;:&quot;644&quot;,&quot;replication&quot;:1,&quot;type&quot;:&quot;FILE&quot;}]}}
 </code></pre><p>It is a design decision of the DSL to not provide type safe classes for various request and response payloads. Doing so would provide an undesirable coupling between the DSL and the service implementation. It also would make adding new commands much more difficult. See the Groovy section below for a variety capabilities and tools for working with JSON and XML to make this easy. The example below shows the use of JsonSlurper and GPath to extract content from a JSON response.</p>
 <pre><code>knox:000&gt; import groovy.json.JsonSlurper
-knox:000&gt; text = Hdfs.ls( hadoop ).dir( &quot;/tmp/example&quot; ).now().string
+knox:000&gt; text = Hdfs.ls( session ).dir( &quot;/tmp/example&quot; ).now().string
 knox:000&gt; json = (new JsonSlurper()).parseText( text )
 knox:000&gt; println json.FileStatuses.FileStatus.pathSuffix
 [README, README2]
 </code></pre><p><em>In the future, &ldquo;built-in&rdquo; methods to slurp JSON and XML may be added to make this a bit easier.</em> <em>This would allow for this type if single line interaction.</em></p>
-<pre><code>println Hdfs.ls(hadoop).dir(&quot;/tmp&quot;).now().json().FileStatuses.FileStatus.pathSuffix
+<pre><code>println Hdfs.ls(session).dir(&quot;/tmp&quot;).now().json().FileStatuses.FileStatus.pathSuffix
 </code></pre><p>Shell session should always be ended with shutting down the session. The examples above do not touch on it but the DSL supports the simple execution of commands asynchronously. The shutdown command attempts to ensures that all asynchronous commands have completed before existing the shell.</p>
-<pre><code>knox:000&gt; hadoop.shutdown()
+<pre><code>knox:000&gt; session.shutdown()
 knox:000&gt; exit
 </code></pre><p>All of the commands above could have been combined into a script file and executed as a single line.</p>
-<pre><code>java -jar bin/shell.jar samples/ExamplePutFile.groovy
-</code></pre><p>This script file is available in the distribution but for convenience, this is the content.</p>
+<pre><code>java -jar bin/shell.jar samples/ExampleWebHdfsPutGet.groovy
+</code></pre><p>This would be the content of that script.</p>
 <pre><code>import org.apache.hadoop.gateway.shell.Hadoop
 import org.apache.hadoop.gateway.shell.hdfs.Hdfs
 import groovy.json.JsonSlurper
@@ -878,30 +887,30 @@ username = &quot;guest&quot;
 password = &quot;guest-password&quot;
 dataFile = &quot;README&quot;
 
-hadoop = Hadoop.login( gateway, username, password )
-Hdfs.rm( hadoop ).file( &quot;/tmp/example&quot; ).recursive().now()
-Hdfs.put( hadoop ).file( dataFile ).to( &quot;/tmp/example/README&quot; ).now()
-text = Hdfs.ls( hadoop ).dir( &quot;/tmp/example&quot; ).now().string
+session = Hadoop.login( gateway, username, password )
+Hdfs.rm( session ).file( &quot;/tmp/example&quot; ).recursive().now()
+Hdfs.put( session ).file( dataFile ).to( &quot;/tmp/example/README&quot; ).now()
+text = Hdfs.ls( session ).dir( &quot;/tmp/example&quot; ).now().string
 json = (new JsonSlurper()).parseText( text )
 println json.FileStatuses.FileStatus.pathSuffix
-hadoop.shutdown()
+session.shutdown()
 exit
 </code></pre><p>Notice the <code>Hdfs.rm</code> command. This is included simply to ensure that the script can be rerun. Without this an error would result the second time it is run.</p><h3><a id="Futures"></a>Futures</h3><p>The DSL supports the ability to invoke commands asynchronously via the later() invocation method. The object returned from the later() method is a java.util.concurrent.Future parametrized with the response type of the command. This is an example of how to asynchronously put a file to HDFS.</p>
-<pre><code>future = Hdfs.put(hadoop).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later()
+<pre><code>future = Hdfs.put(session).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later()
 println future.get().statusCode
 </code></pre><p>The future.get() method will block until the asynchronous command is complete. To illustrate the usefulness of this however multiple concurrent commands are required.</p>
-<pre><code>readmeFuture = Hdfs.put(hadoop).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later()
-licenseFuture = Hdfs.put(hadoop).file(&quot;LICENSE&quot;).to(&quot;tmp/example/LICENSE&quot;).later()
-hadoop.waitFor( readmeFuture, licenseFuture )
+<pre><code>readmeFuture = Hdfs.put(session).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later()
+licenseFuture = Hdfs.put(session).file(&quot;LICENSE&quot;).to(&quot;tmp/example/LICENSE&quot;).later()
+session.waitFor( readmeFuture, licenseFuture )
 println readmeFuture.get().statusCode
 println licenseFuture.get().statusCode
-</code></pre><p>The hadoop.waitFor() method will wait for one or more asynchronous commands to complete.</p><h3><a id="Closures"></a>Closures</h3><p>Futures alone only provide asynchronous invocation of the command. What if some processing should also occur asynchronously once the command is complete. Support for this is provided by closures. Closures are blocks of code that are passed into the later() invocation method. In Groovy these are contained within {} immediately after a method. These blocks of code are executed once the asynchronous command is complete.</p>
-<pre><code>Hdfs.put(hadoop).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later(){ println it.statusCode }
+</code></pre><p>The session.waitFor() method will wait for one or more asynchronous commands to complete.</p><h3><a id="Closures"></a>Closures</h3><p>Futures alone only provide asynchronous invocation of the command. What if some processing should also occur asynchronously once the command is complete. Support for this is provided by closures. Closures are blocks of code that are passed into the later() invocation method. In Groovy these are contained within {} immediately after a method. These blocks of code are executed once the asynchronous command is complete.</p>
+<pre><code>Hdfs.put(session).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later(){ println it.statusCode }
 </code></pre><p>In this example the put() command is executed on a separate thread and once complete the <code>println it.statusCode</code> block is executed on that thread. The it variable is automatically populated by Groovy and is a reference to the result that is returned from the future or now() method. The future example above can be rewritten to illustrate the use of closures.</p>
-<pre><code>readmeFuture = Hdfs.put(hadoop).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later() { println it.statusCode }
-licenseFuture = Hdfs.put(hadoop).file(&quot;LICENSE&quot;).to(&quot;tmp/example/LICENSE&quot;).later() { println it.statusCode }
-hadoop.waitFor( readmeFuture, licenseFuture )
-</code></pre><p>Again, the hadoop.waitFor() method will wait for one or more asynchronous commands to complete.</p><h3><a id="Constructs"></a>Constructs</h3><p>In order to understand the DSL there are three primary constructs that need to be understood.</p><h3><a id="Hadoop"></a>Hadoop</h3><p>This construct encapsulates the client side session state that will be shared between all command invocations. In particular it will simplify the management of any tokens that need to be presented with each command invocation. It also manages a thread pool that is used by all asynchronous commands which is why it is important to call one of the shutdown methods.</p><p>The syntax associated with this is expected to change we expect that credentials will not need to be provided to the gateway. Rather it is expected that some form of access token will be used to initialize the session.</p><h3><a id="Services"></a>Services</h3><p>Services are the primary extension point for adding new suites of com
 mands. The built in examples are: Hdfs, Job and Workflow. The desire for extensibility is the reason for the slightly awkward Hdfs.ls(hadoop) syntax. Certainly something more like <code>hadoop.hdfs().ls()</code> would have been preferred but this would prevent adding new commands easily. At a minimum it would result in extension commands with a different syntax from the &ldquo;built-in&rdquo; commands.</p><p>The service objects essentially function as a factory for a suite of commands.</p><h3><a id="Commands"></a>Commands</h3><p>Commands provide the behavior of the DSL. They typically follow a Fluent interface style in order to allow for single line commands. There are really three parts to each command: Request, Invocation, Response</p><h3><a id="Request"></a>Request</h3><p>The request is populated by all of the methods following the &ldquo;verb&rdquo; method and the &ldquo;invoke&rdquo; method. For example in <code>Hdfs.rm(hadoop).ls(dir).now()</code> the request is populated betw
 een the &ldquo;verb&rdquo; method <code>rm()</code> and the &ldquo;invoke&rdquo; method <code>now()</code>.</p><h3><a id="Invocation"></a>Invocation</h3><p>The invocation method controls how the request is invoked. Currently supported synchronous and asynchronous invocation. The now() method executes the request and returns the result immediately. The later() method submits the request to be executed later and returns a future from which the result can be retrieved. In addition later() invocation method can optionally be provided a closure to execute when the request is complete. See the Futures and Closures sections below for additional detail and examples.</p><h3><a id="Response"></a>Response</h3><p>The response contains the results of the invocation of the request. In most cases the response is a thin wrapper over the HTTP response. In fact many commands will share a single BasicResponse type that only provides a few simple methods.</p>
+<pre><code>readmeFuture = Hdfs.put(session).file(&quot;README&quot;).to(&quot;tmp/example/README&quot;).later() { println it.statusCode }
+licenseFuture = Hdfs.put(session).file(&quot;LICENSE&quot;).to(&quot;tmp/example/LICENSE&quot;).later() { println it.statusCode }
+session.waitFor( readmeFuture, licenseFuture )
+</code></pre><p>Again, the session.waitFor() method will wait for one or more asynchronous commands to complete.</p><h3><a id="Constructs"></a>Constructs</h3><p>In order to understand the DSL there are three primary constructs that need to be understood.</p><h4><a id="Session"></a>Session</h4><p>This construct encapsulates the client side session state that will be shared between all command invocations. In particular it will simplify the management of any tokens that need to be presented with each command invocation. It also manages a thread pool that is used by all asynchronous commands which is why it is important to call one of the shutdown methods.</p><p>The syntax associated with this is expected to change we expect that credentials will not need to be provided to the gateway. Rather it is expected that some form of access token will be used to initialize the session.</p><h4><a id="Services"></a>Services</h4><p>Services are the primary extension point for adding new suites of 
 commands. The current built in examples are: Hdfs, Job and Workflow. The desire for extensibility is the reason for the slightly awkward Hdfs.ls(session) syntax. Certainly something more like <code>session.hdfs().ls()</code> would have been preferred but this would prevent adding new commands easily. At a minimum it would result in extension commands with a different syntax from the &ldquo;built-in&rdquo; commands.</p><p>The service objects essentially function as a factory for a suite of commands.</p><h4><a id="Commands"></a>Commands</h4><p>Commands provide the behavior of the DSL. They typically follow a Fluent interface style in order to allow for single line commands. There are really three parts to each command: Request, Invocation, Response</p><h4><a id="Request"></a>Request</h4><p>The request is populated by all of the methods following the &ldquo;verb&rdquo; method and the &ldquo;invoke&rdquo; method. For example in <code>Hdfs.rm(session).ls(dir).now()</code> the request is 
 populated between the &ldquo;verb&rdquo; method <code>rm()</code> and the &ldquo;invoke&rdquo; method <code>now()</code>.</p><h4><a id="Invocation"></a>Invocation</h4><p>The invocation method controls how the request is invoked. Currently supported synchronous and asynchronous invocation. The now() method executes the request and returns the result immediately. The later() method submits the request to be executed later and returns a future from which the result can be retrieved. In addition later() invocation method can optionally be provided a closure to execute when the request is complete. See the Futures and Closures sections below for additional detail and examples.</p><h4><a id="Response"></a>Response</h4><p>The response contains the results of the invocation of the request. In most cases the response is a thin wrapper over the HTTP response. In fact many commands will share a single BasicResponse type that only provides a few simple methods.</p>
 <pre><code>public int getStatusCode()
 public long getContentLength()
 public String getContentType()
@@ -911,210 +920,29 @@ public String getString()
 public byte[] getBytes()
 public void close();
 </code></pre><p>Thanks to Groovy these methods can be accessed as attributes. In the some of the examples the staticCode was retrieved for example.</p>
-<pre><code>println Hdfs.put(hadoop).rm(dir).now().statusCode
-</code></pre><p>Groovy will invoke the getStatusCode method to retrieve the statusCode attribute.</p><p>The three methods getStream(), getBytes() and getString deserve special attention. Care must be taken that the HTTP body is read only once. Therefore one of these methods (and only one) must be called once and only once. Calling one of these more than once will cause an error. Failing to call one of these methods once will result in lingering open HTTP connections. The close() method may be used if the caller is not interested in reading the result body. Most commands that do not expect a response body will call close implicitly. If the body is retrieved via getBytes() or getString(), the close() method need not be called. When using getStream(), care must be taken to consume the entire body otherwise lingering open HTTP connections will result. The close() method may be called after reading the body partially to discard the remainder of the body.</p><h3><a id="Services"></a>Servi
 ces</h3><p>There are three basic DSL services and commands bundled with the shell.</p><h4><a id="HDFS"></a>HDFS</h4><p>Provides basic HDFS commands. <em>Using these DSL commands requires that WebHDFS be running in the Hadoop cluster.</em></p><h4><a id="Jobs+(Templeton/WebHCat)"></a>Jobs (Templeton/WebHCat)</h4><p>Provides basic job submission and status commands. <em>Using these DSL commands requires that Templeton/WebHCat be running in the Hadoop cluster.</em></p><h4><a id="Workflow+(Oozie)"></a>Workflow (Oozie)</h4><p>Provides basic workflow submission and status commands. <em>Using these DSL commands requires that Oozie be running in the Hadoop cluster.</em></p><h3><a id="HDFS+Commands+(WebHDFS)"></a>HDFS Commands (WebHDFS)</h3><h4><a id="ls()+-+List+the+contents+of+a+HDFS+directory."></a>ls() - List the contents of a HDFS directory.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>dir (String) - The HDFS directory to list.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>BasicResponse</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Hdfs.ls(hadoop).ls().dir(&quot;/&quot;).now()</code></li>
-  </ul></li>
-</ul><h4><a id="rm()+-+Remove+a+HDFS+file+or+directory."></a>rm() - Remove a HDFS file or directory.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>file (String) - The HDFS file or directory to remove.</li>
-    <li>recursive (Boolean) - If the file is a directory also remove any contained files and directories. Optional: default=false</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>EmptyResponse - Implicit close().</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Hdfs.rm(hadoop).file(&quot;/tmp/example&quot;).recursive().now()</code></li>
-  </ul></li>
-</ul><h4><a id="put()+-+Copy+a+file+from+the+local+file+system+to+HDFS."></a>put() - Copy a file from the local file system to HDFS.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>text (String) - The text to copy to the remote file.</li>
-    <li>file (String) - The name of a local file to copy to the remote file.</li>
-    <li>to (String) - The name of the remote file create.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>EmptyResponse - Implicit close().</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Hdfs.put(hadoop).file(&quot;localFile&quot;).to(&quot;/tmp/example/remoteFile&quot;).now()</code></li>
-  </ul></li>
-</ul><h4><a id="get()+-+Copy+a+file+from+HDFS+to+the+local+file+system."></a>get() - Copy a file from HDFS to the local file system.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>file (String) - The name of the local file to create from the remote file. If this isn&rsquo;t specified the file content must be read from the response.</li>
-    <li>from (String) - The name of the remote file to copy.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>BasicResponse</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Hdfs.get(hadoop).file(&quot;localFile&quot;).from(&quot;/tmp/example/remoteFile&quot;).now()</code></li>
-  </ul></li>
-</ul><h4><a id="mkdir()+-+Create+a+directory+in+HDFS."></a>mkdir() - Create a directory in HDFS.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>dir (String) - The name of the remote directory to create.</li>
-    <li>perm (String) - The permissions to create the remote directory with. Optional: default=&ldquo;777&rdquo;</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>EmptyResponse - Implicit close().</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Hdfs.mkdir(hadoop).dir(&quot;/tmp/example&quot;).perm(&quot;777&quot;).now()</code></li>
-  </ul></li>
-</ul><h3><a id="Job+Commands+(WebHCat/Templeton)"></a>Job Commands (WebHCat/Templeton)</h3><h4><a id="submitJava()+-+Submit+a+Java+MapReduce+job."></a>submitJava() - Submit a Java MapReduce job.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>jar (String) - The remote file name of the JAR containing the app to execute.</li>
-    <li>app (String) - The app name to execute. This is wordcount for example not the class name.</li>
-    <li>input (String) - The remote directory name to use as input for the job.</li>
-    <li>output (String) - The remote directory name to store output from the job.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>jobId : String - The job ID of the submitted job. Consumes body.</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Job.submitJava(hadoop).jar(remoteJarName).app(appName).input(remoteInputDir).output(remoteOutputDir).now().jobId</code></li>
-  </ul></li>
-</ul><h4><a id="submitPig()+-+Submit+a+Pig+job."></a>submitPig() - Submit a Pig job.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>file (String) - The remote file name of the pig script.</li>
-    <li>arg (String) - An argument to pass to the script.</li>
-    <li>statusDir (String) - The remote directory to store status output.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>jobId : String - The job ID of the submitted job. Consumes body.</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Job.submitPig(hadoop).file(remotePigFileName).arg(&quot;-v&quot;).statusDir(remoteStatusDir).now()</code></li>
-  </ul></li>
-</ul><h4><a id="submitHive()+-+Submit+a+Hive+job."></a>submitHive() - Submit a Hive job.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>file (String) - The remote file name of the hive script.</li>
-    <li>arg (String) - An argument to pass to the script.</li>
-    <li>statusDir (String) - The remote directory to store status output.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>jobId : String - The job ID of the submitted job. Consumes body.</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Job.submitHive(hadoop).file(remoteHiveFileName).arg(&quot;-v&quot;).statusDir(remoteStatusDir).now()</code></li>
-  </ul></li>
-</ul><h4><a id="queryQueue()+-+Return+a+list+of+all+job+IDs+registered+to+the+user."></a>queryQueue() - Return a list of all job IDs registered to the user.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>No request parameters.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>BasicResponse</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Job.queryQueue(hadoop).now().string</code></li>
-  </ul></li>
-</ul><h4><a id="queryStatus()+-+Check+the+status+of+a+job+and+get+related+job+information+given+its+job+ID."></a>queryStatus() - Check the status of a job and get related job information given its job ID.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>jobId (String) - The job ID to check. This is the ID received when the job was created.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>BasicResponse</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Job.queryStatus(hadoop).jobId(jobId).now().string</code></li>
-  </ul></li>
-</ul><h3><a id="Workflow+Commands+(Oozie)"></a>Workflow Commands (Oozie)</h3><h4><a id="submit()+-+Submit+a+workflow+job."></a>submit() - Submit a workflow job.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>text (String) - XML formatted workflow configuration string.</li>
-    <li>file (String) - A filename containing XML formatted workflow configuration.</li>
-    <li>action (String) - The initial action to take on the job. Optional: Default is &ldquo;start&rdquo;.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>BasicResponse</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Workflow.submit(hadoop).file(localFile).action(&quot;start&quot;).now()</code></li>
-  </ul></li>
-</ul><h4><a id="status()+-+Query+the+status+of+a+workflow+job."></a>status() - Query the status of a workflow job.</h4>
-<ul>
-  <li>Request
-  <ul>
-    <li>jobId (String) - The job ID to check. This is the ID received when the job was created.</li>
-  </ul></li>
-  <li>Response
-  <ul>
-    <li>BasicResponse</li>
-  </ul></li>
-  <li>Example
-  <ul>
-    <li><code>Workflow.status(hadoop).jobId(jobId).now().string</code></li>
-  </ul></li>
-</ul><h3><a id="Extension"></a>Extension</h3><p>Extensibility is a key design goal of the KnoxShell and DSL. There are two ways to provide extended functionality for use with the shell. The first is to simply create Groovy scripts that use the DSL to perform a useful task. The second is to add new services and commands. In order to add new service and commands new classes must be written in either Groovy or Java and added to the classpath of the shell. Fortunately there is a very simple way to add classes and JARs to the shell classpath. The first time the shell is executed it will create a configuration file in the same directory as the JAR with the same base name and a <code>.cfg</code> extension.</p>
+<pre><code>println Hdfs.put(session).rm(dir).now().statusCode
+</code></pre><p>Groovy will invoke the getStatusCode method to retrieve the statusCode attribute.</p><p>The three methods getStream(), getBytes() and getString deserve special attention. Care must be taken that the HTTP body is fully read once and only once. Therefore one of these methods (and only one) must be called once and only once. Calling one of these more than once will cause an error. Failing to call one of these methods once will result in lingering open HTTP connections. The close() method may be used if the caller is not interested in reading the result body. Most commands that do not expect a response body will call close implicitly. If the body is retrieved via getBytes() or getString(), the close() method need not be called. When using getStream(), care must be taken to consume the entire body otherwise lingering open HTTP connections will result. The close() method may be called after reading the body partially to discard the remainder of the body.</p><h3><a id="Serv
 ices"></a>Services</h3><p>The built-in supported client DLS for each Hadoop service can be found in the <a href="#Service+Details">Service Details</a> section.</p><h3><a id="Extension"></a>Extension</h3><p>Extensibility is a key design goal of the KnoxShell and client DSL. There are two ways to provide extended functionality for use with the shell. The first is to simply create Groovy scripts that use the DSL to perform a useful task. The second is to add new services and commands. In order to add new service and commands new classes must be written in either Groovy or Java and added to the classpath of the shell. Fortunately there is a very simple way to add classes and JARs to the shell classpath. The first time the shell is executed it will create a configuration file in the same directory as the JAR with the same base name and a <code>.cfg</code> extension.</p>
 <pre><code>bin/shell.jar
 bin/shell.cfg
 </code></pre><p>That file contains both the main class for the shell as well as a definition of the classpath. Currently that file will by default contain the following.</p>
 <pre><code>main.class=org.apache.hadoop.gateway.shell.Shell
 class.path=../lib; ../lib/*.jar; ../ext; ../ext/*.jar
 </code></pre><p>Therefore to extend the shell you should copy any new service and command class either to the <code>ext</code> directory or if they are packaged within a JAR copy the JAR to the <code>ext</code> directory. The <code>lib</code> directory is reserved for JARs that may be delivered with the product.</p><p>Below are samples for the service and command classes that would need to be written to add new commands to the shell. These happen to be Groovy source files but could with very minor changes be Java files. The easiest way to add these to the shell is to compile them directory into the <code>ext</code> directory. <em>Note: This command depends upon having the Groovy compiler installed and available on the execution path.</em></p>
-<pre><code>groovy -d ext -cp bin/shell.jar samples/SampleService.groovy samples/SampleSimpleCommand.groovy samples/SampleComplexCommand.groovy
+<pre><code>groovy -d ext -cp bin/shell.jar samples/SampleService.groovy \
+    samples/SampleSimpleCommand.groovy samples/SampleComplexCommand.groovy
 </code></pre><p>These source files are available in the samples directory of the distribution but these are included here for convenience.</p><h4><a id="Sample+Service+(Groovy)"></a>Sample Service (Groovy)</h4>
 <pre><code>import org.apache.hadoop.gateway.shell.Hadoop
 
 class SampleService {
 
-    static String PATH = &quot;/namenode/api/v1&quot;
+    static String PATH = &quot;/webhdfs/v1&quot;
 
-    static SimpleCommand simple( Hadoop hadoop ) {
-        return new SimpleCommand( hadoop )
+    static SimpleCommand simple( Hadoop session ) {
+        return new SimpleCommand( session )
     }
 
-    static ComplexCommand.Request complex( Hadoop hadoop ) {
-        return new ComplexCommand.Request( hadoop )
+    static ComplexCommand.Request complex( Hadoop session ) {
+        return new ComplexCommand.Request( session )
     }
 
 }
@@ -1129,8 +957,8 @@ import java.util.concurrent.Callable
 
 class SimpleCommand extends AbstractRequest&lt;BasicResponse&gt; {
 
-    SimpleCommand( Hadoop hadoop ) {
-        super( hadoop )
+    SimpleCommand( Hadoop session ) {
+        super( session )
     }
 
     private String param
@@ -1168,8 +996,8 @@ class ComplexCommand {
 
     static class Request extends AbstractRequest&lt;Response&gt; {
 
-        Request( Hadoop hadoop ) {
-            super( hadoop )
+        Request( Hadoop session ) {
+            super( session )
         }
 
         private String param;
@@ -1207,9 +1035,9 @@ class ComplexCommand {
 
 }
 </code></pre><h3><a id="Groovy"></a>Groovy</h3><p>The shell included in the distribution is basically an unmodified packaging of the Groovy shell. The distribution does however provide a wrapper that makes it very easy to setup the class path for the shell. In fact the JARs required to execute the DSL are included on the class path by default. Therefore these command are functionally equivalent if you have Groovy [installed][15]. See below for a description of what is required for JARs required by the DSL from <code>lib</code> and <code>dep</code> directories.</p>
-<pre><code>java -jar bin/shell.jar samples/ExamplePutFile.groovy
-groovy -classpath {JARs required by the DSL from lib and dep} samples/ExamplePutFile.groovy
-</code></pre><p>The interactive shell isn&rsquo;t exactly equivalent. However the only difference is that the shell.jar automatically executes some additional imports that are useful for the KnoxShell DSL. So these two sets of commands should be functionality equivalent. <em>However there is currently a class loading issue that prevents the groovysh command from working propertly.</em></p>
+<pre><code>java -jar bin/shell.jar samples/ExampleWebHdfsPutGet.groovy
+groovy -classpath {JARs required by the DSL from lib and dep} samples/ExampleWebHdfsPutGet.groovy
+</code></pre><p>The interactive shell isn&rsquo;t exactly equivalent. However the only difference is that the shell.jar automatically executes some additional imports that are useful for the KnoxShell client DSL. So these two sets of commands should be functionality equivalent. <em>However there is currently a class loading issue that prevents the groovysh command from working properly.</em></p>
 <pre><code>java -jar bin/shell.jar
 
 groovysh -classpath {JARs required by the DSL from lib and dep}
@@ -1226,17 +1054,17 @@ import org.apache.hadoop.gateway.shell.h
 import org.apache.hadoop.gateway.shell.job.Job
 import org.apache.hadoop.gateway.shell.workflow.Workflow
 import java.util.concurrent.TimeUnit
-</code></pre><p>The list of JARs currently required by the DSL is</p>
+</code></pre><p>The JARs currently required by the client DSL are</p>
 <pre><code>lib/gateway-shell-${gateway-version}.jar
 dep/httpclient-4.2.3.jar
 dep/httpcore-4.2.2.jar
 dep/commons-lang3-3.1.jar
 dep/commons-codec-1.7.jar
 </code></pre><p>So on Linux/MacOS you would need this command</p>
-<pre><code>groovy -cp lib/gateway-shell-0.2.0-SNAPSHOT.jar:dep/httpclient-4.2.3.jar:dep/httpcore-4.2.2.jar:dep/commons-lang3-3.1.jar:dep/commons-codec-1.7.jar samples/ExamplePutFile.groovy
+<pre><code>groovy -cp lib/gateway-shell-0.2.0-SNAPSHOT.jar:dep/httpclient-4.2.3.jar:dep/httpcore-4.2.2.jar:dep/commons-lang3-3.1.jar:dep/commons-codec-1.7.jar samples/ExampleWebHdfsPutGet.groovy
 </code></pre><p>and on Windows you would need this command</p>
-<pre><code>groovy -cp lib/gateway-shell-0.2.0-SNAPSHOT.jar;dep/httpclient-4.2.3.jar;dep/httpcore-4.2.2.jar;dep/commons-lang3-3.1.jar;dep/commons-codec-1.7.jar samples/ExamplePutFile.groovy
-</code></pre><p>The exact list of required JARs is likely to change from release to release so it is recommended that you utilize the wrapper <code>bin/shell.jar</code>.</p><p>In addition because the DSL can be used via standard Groovy, the Groovy integrations in many popular IDEs (e.g. IntelliJ , Eclipse) can also be used. This makes it particularly nice to develop and execute scripts to interact with Hadoop. The code-completion feature in particular provides immense value. All that is required is to add the shell-0.2.0.jar to the projects class path.</p><p>There are a variety of Groovy tools that make it very easy to work with the standard interchange formats (i.e. JSON and XML). In Groovy the creation of XML or JSON is typically done via a &ldquo;builder&rdquo; and parsing done via a &ldquo;slurper&rdquo;. In addition once JSON or XML is &ldquo;slurped&rdquo; the GPath, an XPath like feature build into Groovy can be used to access data.</p>
+<pre><code>groovy -cp lib/gateway-shell-0.2.0-SNAPSHOT.jar;dep/httpclient-4.2.3.jar;dep/httpcore-4.2.2.jar;dep/commons-lang3-3.1.jar;dep/commons-codec-1.7.jar samples/ExampleWebHdfsPutGet.groovy
+</code></pre><p>The exact list of required JARs is likely to change from release to release so it is recommended that you utilize the wrapper <code>bin/shell.jar</code>.</p><p>In addition because the DSL can be used via standard Groovy, the Groovy integrations in many popular IDEs (e.g. IntelliJ , Eclipse) can also be used. This makes it particularly nice to develop and execute scripts to interact with Hadoop. The code-completion features in modern IDEs in particular provides immense value. All that is required is to add the shell-0.2.0.jar to the projects class path.</p><p>There are a variety of Groovy tools that make it very easy to work with the standard interchange formats (i.e. JSON and XML). In Groovy the creation of XML or JSON is typically done via a &ldquo;builder&rdquo; and parsing done via a &ldquo;slurper&rdquo;. In addition once JSON or XML is &ldquo;slurped&rdquo; the GPath, an XPath like feature build into Groovy can be used to access data.</p>
 <ul>
   <li>XML
   <ul>
@@ -1251,7 +1079,14 @@ dep/commons-codec-1.7.jar
     <li>JSON Path <a href="https://code.google.com/p/json-path/">API</a></li>
     <li>GPath <a href="http://groovy.codehaus.org/GPath">Overview</a></li>
   </ul></li>
-</ul><h2><a id="Service+Details"></a>Service Details</h2><p>In the sections that follow the integrations currently available out of the box with the gateway will be described. In general these sections will include examples that demonstrate how to access each of these services via the gateway. In many cases this will include both the use of <a href="http://curl.haxx.se/">cURL</a> as a REST API client as well as the use of the Knox Client DSL. You may notice that there are some minor differences between using the REST API of a given service via the gateway. In general this is necessary in order to achieve the goal of leaking internal Hadoop cluster details to the client.</p><p>Keep in mind that the gateway uses a plugin model for supporting Hadoop services. Check back with a the <a href="http://knox.incubator.apache.org">Apache Knox</a> site for the latest news on plugin availability. You can also create your own custom plugin to extend the capabilities of the gateway.</p><h3><a id="
 Assumptions"></a>Assumptions</h3><p>This document assumes a few things about your environment in order to simplify the examples.</p>
+</ul><h2><a id="Service+Details"></a>Service Details</h2><p>In the sections that follow the integrations currently available out of the box with the gateway will be described. In general these sections will include examples that demonstrate how to access each of these services via the gateway. In many cases this will include both the use of <a href="http://curl.haxx.se/">cURL</a> as a REST API client as well as the use of the Knox Client DSL. You may notice that there are some minor differences between using the REST API of a given service via the gateway. In general this is necessary in order to achieve the goal of leaking internal Hadoop cluster details to the client.</p><p>Keep in mind that the gateway uses a plugin model for supporting Hadoop services. Check back with the <a href="http://knox.incubator.apache.org">Apache Knox</a> site for the latest news on plugin availability. You can also create your own custom plugin to extend the capabilities of the gateway.</p><p>These are 
 the current Hadoop services with built-in support.</p>
+<ul>
+  <li><a href="#WebHDFS">WebHDFS</a></li>
+  <li><a href="#WebHCat">WebHCat</a></li>
+  <li><a href="#Oozie">Oozie</a></li>
+  <li><a href="#HBase">HBase</a></li>
+  <li><a href="#Hive">Hive</a></li>
+</ul><h3><a id="Assumptions"></a>Assumptions</h3><p>This document assumes a few things about your environment in order to simplify the examples.</p>
 <ul>
   <li>The JVM is executable as simply java.</li>
   <li>The Apache Knox Gateway is installed and functional.</li>
@@ -1304,43 +1139,43 @@ dep/commons-codec-1.7.jar
 </code></pre><p>Note that this URL contains the newtwork location of a Data Node. The gateway will rewrite this URL to look like the URL below.</p>
 <pre><code>https://{gateway-host}:{gateway-port}/{gateway-path}/{custer-name}/webhdfs/data/v1/{path}?_={encrypted-query-parameters}
 </code></pre><p>The <code>{encrypted-query-parameters}</code> will contain the <code>{datanode-host}</code> and <code>{datanode-port}</code> information. This information along with the original query parameters are encrypted so that the internal Hadoop details are protected.</p><h4><a id="WebHDFS+Examples"></a>WebHDFS Examples</h4><p>The examples below upload a file, download the file and list the contents of the directory.</p><h5><a id="WebHDFS+via+client+DSL"></a>WebHDFS via client DSL</h5><p>You can use the Groovy example scripts and interpreter provided with the distribution.</p>
-<pre><code>java -jar bin/shell.jar samples/ExampleWebHfsPutGet.groovy
-java -jar bin/shell.jar samples/ExampleWebHfsLs.groovy
+<pre><code>java -jar bin/shell.jar samples/ExampleWebHdfsPutGet.groovy
+java -jar bin/shell.jar samples/ExampleWebHdfsLs.groovy
 </code></pre><p>You can manually type the client DSL script into the KnoxShell interactive Groovy interpreter provided with the distribution. The command below starts the KnoxShell in interactive mode.</p>
 <pre><code>java -jar bin/shell.jar
 </code></pre><p>Each line below could be typed or copied into the interactive shell and executed. This is provided as an example to illustrate the use of the client DSL.</p>
-<pre><code># Import the client DSL and a useful utilities for working with JSON.
+<pre><code>// Import the client DSL and a useful utilities for working with JSON.
 import org.apache.hadoop.gateway.shell.Hadoop
 import org.apache.hadoop.gateway.shell.hdfs.Hdfs
 import groovy.json.JsonSlurper
 
-# Setup some basic config.
+// Setup some basic config.
 gateway = &quot;https://localhost:8443/gateway/sandbox&quot;
 username = &quot;guest&quot;
 password = &quot;guest-password&quot;
 
-# Start the session.
+// Start the session.
 session = Hadoop.login( gateway, username, password )
 
-# Cleanup anything leftover from a previous run.
+// Cleanup anything leftover from a previous run.
 Hdfs.rm( session ).file( &quot;/user/guest/example&quot; ).recursive().now()
 
-# Upload the README to HDFS.
-Hdfs.put( session ).file( README ).to( &quot;/user/guest/example/README&quot; ).now()
+// Upload the README to HDFS.
+Hdfs.put( session ).file( &quot;README&quot; ).to( &quot;/user/guest/example/README&quot; ).now()
 
-# Download the README from HDFS.
+// Download the README from HDFS.
 text = Hdfs.get( session ).from( &quot;/user/guest/example/README&quot; ).now().string
 println text
 
-# List the contents of the directory.
+// List the contents of the directory.
 text = Hdfs.ls( session ).dir( &quot;/user/guest/example&quot; ).now().string
 json = (new JsonSlurper()).parseText( text )
 println json.FileStatuses.FileStatus.pathSuffix
 
-# Cleanup the directory.
+// Cleanup the directory.
 Hdfs.rm( session ).file( &quot;/user/guest/example&quot; ).recursive().now()
 
-# Clean the session.
+// Clean the session.
 session.shutdown()
 </code></pre><h5><a id="WebHDFS+via+cURL"></a>WebHDFS via cURL</h5><p>Use can use cURL to directly invoke the REST APIs via the gateway.</p><h6><a id="Optionally+cleanup+the+sample+directory+in+case+a+previous+example+was+run+without+cleaning+up."></a>Optionally cleanup the sample directory in case a previous example was run without cleaning up.</h6>
 <pre><code>curl -i -k -u guest:guest-password -X DELETE \
@@ -1363,12 +1198,12 @@ session.shutdown()
 </code></pre><h6><a id="Optionally+cleanup+the+example+directory."></a>Optionally cleanup the example directory.</h6>
 <pre><code>curl -i -k -u guest:guest-password -X DELETE \
     &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&amp;recursive=true&#39;
-</code></pre><h5><a id="WebHDFS+client+DSL"></a>WebHDFS client DSL</h5><h6><a id="get+-+Get+a+file+from+HDFS+(OPEN)."></a>get - Get a file from HDFS (OPEN).</h6>
+</code></pre><h5><a id="WebHDFS+client+DSL"></a>WebHDFS client DSL</h5><h6><a id="get()+-+Get+a+file+from+HDFS+(OPEN)."></a>get() - Get a file from HDFS (OPEN).</h6>
 <ul>
   <li>Request
   <ul>
     <li>from( String name ) - The full name of the file in HDFS.</li>
-    <li>file( String name ) - The name name of a local file to create with the content.</li>
+    <li>file( String name ) - The name name of a local file to create with the content. If this isn&rsquo;t specified the file content must be read from the response.</li>
   </ul></li>
   <li>Response
   <ul>
@@ -1379,7 +1214,7 @@ session.shutdown()
   <ul>
     <li><code>Hdfs.get( session ).from( &quot;/user/guest/example/README&quot; ).now().string</code></li>
   </ul></li>
-</ul><h6><a id="ls+-+Query+the+contents+of+a+directory+(LISTSTATUS)"></a>ls - Query the contents of a directory (LISTSTATUS)</h6>
+</ul><h6><a id="ls()+-+Query+the+contents+of+a+directory+(LISTSTATUS)"></a>ls() - Query the contents of a directory (LISTSTATUS)</h6>
 <ul>
   <li>Request
   <ul>
@@ -1393,22 +1228,22 @@ session.shutdown()
   <ul>
     <li><code>Hdfs.ls( session ).dir( &quot;/user/guest/example&quot; ).now().string</code></li>
   </ul></li>
-</ul><h6><a id="mkdir+-+Create+a+directory+in+HDFS+(MKDIRS)"></a>mkdir - Create a directory in HDFS (MKDIRS)</h6>
+</ul><h6><a id="mkdir()+-+Create+a+directory+in+HDFS+(MKDIRS)"></a>mkdir() - Create a directory in HDFS (MKDIRS)</h6>
 <ul>
   <li>Request
   <ul>
     <li>dir( String name ) - The full name of the directory to create in HDFS.</li>
-    <li>perm( String perm ) - The permissions for the directory (e.g. 644).</li>
+    <li>perm( String perm ) - The permissions for the directory (e.g. 644). Optional: default=&ldquo;777&rdquo;</li>
   </ul></li>
   <li>Response
   <ul>
-    <li>BasicResponse</li>
+    <li>EmptyResponse - Implicit close().</li>
   </ul></li>
   <li>Example
   <ul>
     <li><code>Hdfs.mkdir( session ).dir( &quot;/user/guest/example&quot; ).now()</code></li>
   </ul></li>
-</ul><h6><a id="put+-+Write+a+file+into+HDFS+(CREATE)"></a>put - Write a file into HDFS (CREATE)</h6>
+</ul><h6><a id="put()+-+Write+a+file+into+HDFS+(CREATE)"></a>put() - Write a file into HDFS (CREATE)</h6>
 <ul>
   <li>Request
   <ul>
@@ -1418,263 +1253,269 @@ session.shutdown()
   </ul></li>
   <li>Response
   <ul>
-    <li>BasicResponse</li>
+    <li>EmptyResponse - Implicit close().</li>
   </ul></li>
   <li>Example
   <ul>
     <li><code>Hdfs.put( session ).file( README ).to( &quot;/user/guest/example/README&quot; ).now()</code></li>
   </ul></li>
-</ul><h6><a id="rm+-+Delete+a+file+or+directory+(DELETE)"></a>rm - Delete a file or directory (DELETE)</h6>
+</ul><h6><a id="rm()+-+Delete+a+file+or+directory+(DELETE)"></a>rm() - Delete a file or directory (DELETE)</h6>
 <ul>
   <li>Request
   <ul>
     <li>file( String name ) - The fully qualified file or directory name in HDFS.</li>
-    <li>recursive( Boolean recursive ) - Delete directory and all of its contents if True.</li>
+    <li>recursive( Boolean recursive ) - Delete directory and all of its contents if True. Optional: default=False</li>
   </ul></li>
   <li>Response
   <ul>
-    <li>BasicResponse</li>
+    <li>BasicResponse - Implicit close().</li>
   </ul></li>
   <li>Example
   <ul>
     <li><code>Hdfs.rm( session ).file( &quot;/user/guest/example&quot; ).recursive().now()</code></li>
   </ul></li>
-</ul><h3><a id="WebHCat"></a>WebHCat</h3><p>TODO</p><h4><a id="WebHCat+URL+Mapping"></a>WebHCat URL Mapping</h4><p>TODO</p><h4><a id="WebHCat+Examples"></a>WebHCat Examples</h4><p>TODO</p><h4><a id="Example+#1:+WebHDFS+&+Templeton/WebHCat+via+KnoxShell+DSL"></a>Example #1: WebHDFS &amp; Templeton/WebHCat via KnoxShell DSL</h4><p>This example will submit the familiar WordCount Java MapReduce job to the Hadoop cluster via the gateway using the KnoxShell DSL. There are several ways to do this depending upon your preference.</p><p>You can use the &ldquo;embedded&rdquo; Groovy interpreter provided with the distribution.</p>
-<pre><code>java -jar bin/shell.jar samples/ExampleSubmitJob.groovy
+</ul><h3><a id="WebHCat"></a>WebHCat</h3><p>WebHCat is a related but separate service from Hive. As such it is installed and configured independently. The <a href="https://cwiki.apache.org/confluence/display/Hive/WebHCat">WebHCat wiki pages</a> describe this processes. In sandbox this configuration file for WebHCat is located at /etc/hadoop/hcatalog/webhcat-site.xml. Note the properties shown below as they are related to configuration required by the gateway.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;templeton.port&lt;/name&gt;
+    &lt;value&gt;50111&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><p>Also important is the configuration of the JOBTRACKER RPC endpoint. For Hadoop 2 this can be found in the yarn-site.xml file. In Sandbox this file can be found at /etc/hadoop/conf/yarn-site.xml. The property yarn.resourcemanager.address within that file is relevant for the gateway&rsquo;s configuration.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;yarn.resourcemanager.address&lt;/name&gt;
+    &lt;value&gt;sandbox.hortonworks.com:8050&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><p>See <a href="#WebHDFS">WebHDFS</a> for details about locating the Haddop configuration for the NAMENODE endpoint.</p><p>The gateway by default includes a sample topology descriptor file <code>{GATEWAY_HOME}/deployments/sandbox.xml</code>. The values in this sample are configured to work with an installed Sandbox VM.</p>
+<pre><code>&lt;service&gt;
+    &lt;role&gt;NAMENODE&lt;/role&gt;
+    &lt;url&gt;hdfs://localhost:8020&lt;/url&gt;
+&lt;/service&gt;
+&lt;service&gt;
+    &lt;role&gt;JOBTRACKER&lt;/role&gt;
+    &lt;url&gt;rpc://localhost:8050&lt;/url&gt;
+&lt;/service&gt;
+&lt;service&gt;
+    &lt;role&gt;WEBHCAT&lt;/role&gt;
+    &lt;url&gt;http://localhost:50111/templeton&lt;/url&gt;
+&lt;/service&gt;
+</code></pre><p>The URLs provided for the role NAMENODE and JOBTRACKER do not result in an endpoint being exposed by the gateway. This information is only required so that other URLs can be rewritten that reference the appropriate RPC address for Hadoop services. This prevents clients from needed to be aware of the internal cluster details. Note that for Hadoop 2 the JOBTRACKER RPC endpoint is provided by the Resource Manager component.</p><p>By default the gateway is configured to use the HTTP endpoint for WebHCat in the Sandbox. This could alternatively be configured to use the HTTPS endpoint by provided the correct address.</p><h4><a id="WebHCat+URL+Mapping"></a>WebHCat URL Mapping</h4><p>For WebHCat URLs, the mapping of Knox Gateway accessible URLs to direct WebHCat URLs is simple.</p>
+<table>
+  <tbody>
+    <tr>
+      <td>Gateway </td>
+      <td><code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/templeton</code> </td>
+    </tr>
+    <tr>
+      <td>Cluster </td>
+      <td><code>http://{webhcat-host}:{webhcat-port}/templeton}</code> </td>
+    </tr>
+  </tbody>
+</table><h4><a id="WebHCat+Example"></a>WebHCat Example</h4><p>This example will submit the familiar WordCount Java MapReduce job to the Hadoop cluster via the gateway using the KnoxShell DSL. There are several ways to do this depending upon your preference.</p><p>You can use the &ldquo;embedded&rdquo; Groovy interpreter provided with the distribution.</p>
+<pre><code>java -jar bin/shell.jar samples/ExampleWebHCatJob.groovy
 </code></pre><p>You can manually type in the KnoxShell DSL script into the &ldquo;embedded&rdquo; Groovy interpreter provided with the distribution.</p>
 <pre><code>java -jar bin/shell.jar
-</code></pre><p>Each line from the file below will need to be typed or copied into the interactive shell.</p><h5><a id="samples/ExampleSubmitJob"></a>samples/ExampleSubmitJob</h5>
-<pre><code>import com.jayway.jsonpath.JsonPath
-import org.apache.hadoop.gateway.shell.Hadoop
-import org.apache.hadoop.gateway.shell.hdfs.Hdfs
-import org.apache.hadoop.gateway.shell.job.Job
-
-import static java.util.concurrent.TimeUnit.SECONDS
-
-gateway = &quot;https://localhost:8443/gateway/sandbox&quot;
-username = &quot;guest&quot;
-password = &quot;guest-password&quot;
-dataFile = &quot;LICENSE&quot;
-jarFile = &quot;samples/hadoop-examples.jar&quot;
-
-hadoop = Hadoop.login( gateway, username, password )
-
-println &quot;Delete /tmp/test &quot; + Hdfs.rm(hadoop).file( &quot;/tmp/test&quot; ).recursive().now().statusCode
-println &quot;Create /tmp/test &quot; + Hdfs.mkdir(hadoop).dir( &quot;/tmp/test&quot;).now().statusCode
-
-putData = Hdfs.put(hadoop).file( dataFile ).to( &quot;/tmp/test/input/FILE&quot; ).later() {
-    println &quot;Put /tmp/test/input/FILE &quot; + it.statusCode }
-putJar = Hdfs.put(hadoop).file( jarFile ).to( &quot;/tmp/test/hadoop-examples.jar&quot; ).later() {
-     println &quot;Put /tmp/test/hadoop-examples.jar &quot; + it.statusCode }
-hadoop.waitFor( putData, putJar )
-
-jobId = Job.submitJava(hadoop) \
-    .jar( &quot;/tmp/test/hadoop-examples.jar&quot; ) \
-    .app( &quot;wordcount&quot; ) \
-    .input( &quot;/tmp/test/input&quot; ) \
-    .output( &quot;/tmp/test/output&quot; ) \
-    .now().jobId
-println &quot;Submitted job &quot; + jobId
-
-done = false
-count = 0
-while( !done &amp;&amp; count++ &lt; 60 ) {
-    sleep( 1000 )
-    json = Job.queryStatus(hadoop).jobId(jobId).now().string
-    done = JsonPath.read( json, &quot;${SDS}.status.jobComplete&quot; )
-}
-println &quot;Done &quot; + done
-
-println &quot;Shutdown &quot; + hadoop.shutdown( 10, SECONDS )
-
-exit
-</code></pre><h3><a id="Oozie"></a>Oozie</h3><p>TODO</p><h4><a id="Oozie+URL+Mapping"></a>Oozie URL Mapping</h4><p>TODO</p><h4><a id="Oozie+Examples"></a>Oozie Examples</h4><p>TODO</p><h4><a id="Example+#2:+WebHDFS+&+Oozie+via+KnoxShell+DSL"></a>Example #2: WebHDFS &amp; Oozie via KnoxShell DSL</h4><p>This example will also submit the familiar WordCount Java MapReduce job to the Hadoop cluster via the gateway using the KnoxShell DSL. However in this case the job will be submitted via a Oozie workflow. There are several ways to do this depending upon your preference.</p><p>You can use the &ldquo;embedded&rdquo; Groovy interpreter provided with the distribution.</p>
-<pre><code>java -jar bin/shell.jar samples/ExampleSubmitWorkflow.groovy
+</code></pre><p>Each line from the file <code>samples/ExampleWebHCatJob.groovy</code> would then need to be typed or copied into the interactive shell.</p><h4><a id="WebHCat+Client+DSL"></a>WebHCat Client DSL</h4><h5><a id="submitJava()+-+Submit+a+Java+MapReduce+job."></a>submitJava() - Submit a Java MapReduce job.</h5>
+<ul>
+  <li>Request
+  <ul>
+    <li>jar (String) - The remote file name of the JAR containing the app to execute.</li>
+    <li>app (String) - The app name to execute. This is wordcount for example not the class name.</li>
+    <li>input (String) - The remote directory name to use as input for the job.</li>
+    <li>output (String) - The remote directory name to store output from the job.</li>
+  </ul></li>
+  <li>Response
+  <ul>
+    <li>jobId : String - The job ID of the submitted job. Consumes body.</li>
+  </ul></li>
+  <li>Example</li>
+</ul>
+<pre><code>Job.submitJava(session)
+    .jar(remoteJarName)
+    .app(appName)
+    .input(remoteInputDir)
+    .output(remoteOutputDir)
+    .now()
+    .jobId
+</code></pre><h5><a id="submitPig()+-+Submit+a+Pig+job."></a>submitPig() - Submit a Pig job.</h5>
+<ul>
+  <li>Request
+  <ul>
+    <li>file (String) - The remote file name of the pig script.</li>
+    <li>arg (String) - An argument to pass to the script.</li>
+    <li>statusDir (String) - The remote directory to store status output.</li>
+  </ul></li>
+  <li>Response
+  <ul>
+    <li>jobId : String - The job ID of the submitted job. Consumes body.</li>
+  </ul></li>
+  <li>Example
+  <ul>
+    <li><code>Job.submitPig(session).file(remotePigFileName).arg(&quot;-v&quot;).statusDir(remoteStatusDir).now()</code></li>
+  </ul></li>
+</ul><h5><a id="submitHive()+-+Submit+a+Hive+job."></a>submitHive() - Submit a Hive job.</h5>
+<ul>
+  <li>Request
+  <ul>
+    <li>file (String) - The remote file name of the hive script.</li>
+    <li>arg (String) - An argument to pass to the script.</li>
+    <li>statusDir (String) - The remote directory to store status output.</li>
+  </ul></li>
+  <li>Response
+  <ul>
+    <li>jobId : String - The job ID of the submitted job. Consumes body.</li>
+  </ul></li>
+  <li>Example
+  <ul>
+    <li><code>Job.submitHive(session).file(remoteHiveFileName).arg(&quot;-v&quot;).statusDir(remoteStatusDir).now()</code></li>
+  </ul></li>
+</ul><h5><a id="queryQueue()+-+Return+a+list+of+all+job+IDs+registered+to+the+user."></a>queryQueue() - Return a list of all job IDs registered to the user.</h5>
+<ul>
+  <li>Request
+  <ul>
+    <li>No request parameters.</li>
+  </ul></li>
+  <li>Response
+  <ul>
+    <li>BasicResponse</li>
+  </ul></li>
+  <li>Example
+  <ul>
+    <li><code>Job.queryQueue(session).now().string</code></li>
+  </ul></li>
+</ul><h5><a id="queryStatus()+-+Check+the+status+of+a+job+and+get+related+job+information+given+its+job+ID."></a>queryStatus() - Check the status of a job and get related job information given its job ID.</h5>
+<ul>
+  <li>Request
+  <ul>
+    <li>jobId (String) - The job ID to check. This is the ID received when the job was created.</li>
+  </ul></li>
+  <li>Response
+  <ul>
+    <li>BasicResponse</li>
+  </ul></li>
+  <li>Example
+  <ul>
+    <li><code>Job.queryStatus(session).jobId(jobId).now().string</code></li>
+  </ul></li>
+</ul><h3><a id="Oozie"></a>Oozie</h3><p>Oozie is a Hadoop component provides complex job workflows to be submitted and managed. Please refer to the latest <a href="http://oozie.apache.org/docs/4.0.0/">Oozie documentation</a> for details.</p><p>In order to make Oozie accessible via the gateway there are several important Haddop configuration settings. These all relate to the network endpoint exposed by various Hadoop services.</p><p>The HTTP endpoint at which Oozie is running can be found via the oozie.base.url property in the oozie-site.xml file. In a Sandbox installation this can typically be found in /etc/oozie/conf/oozie-site.xml.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;oozie.base.url&lt;/name&gt;
+    &lt;value&gt;http://sandbox.hortonworks.com:11000/oozie&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><p>The RPC address at which the Resource Manager exposes the JOBTRACKER endpoint can be found via the yarn.resourcemanager.address in the yarn-site.xml file. In a Sandbox installation this can typically be found in /etc/hadoop/conf/yarn-site.xml.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;yarn.resourcemanager.address&lt;/name&gt;
+    &lt;value&gt;sandbox.hortonworks.com:8050&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><p>The RPC address at which the Name Node exposes its RPC endpoint can be found via the dfs.namenode.rpc-address in the hdfs-site.xml file. In a Sandbox installation this can typically be found in /etc/hadoop/conf/hdfs-site.xml.</p>
+<pre><code>&lt;property&gt;
+    &lt;name&gt;dfs.namenode.rpc-address&lt;/name&gt;
+    &lt;value&gt;sandbox.hortonworks.com:8020&lt;/value&gt;
+&lt;/property&gt;
+</code></pre><p>The information above must be provided to the gateway via a topology descriptor file. These topology descriptor files are placed in <code>{GATEWAY_HOME}/deployments</code>. An example that is setup for the default configuration of the Sandbox is {GATEWAY_HOME}/deployments/sandbox.xml. These values will need to be changed for non-default Sandbox or other Hadoop cluster configuration.</p>
+<pre><code>&lt;service&gt;
+    &lt;role&gt;NAMENODE&lt;/role&gt;
+    &lt;url&gt;hdfs://localhost:8020&lt;/url&gt;
+&lt;/service&gt;
+&lt;service&gt;
+    &lt;role&gt;JOBTRACKER&lt;/role&gt;
+    &lt;url&gt;rpc://localhost:8050&lt;/url&gt;
+&lt;/service&gt;
+&lt;service&gt;
+    &lt;role&gt;OOZIE&lt;/role&gt;
+    &lt;url&gt;http://localhost:11000/oozie&lt;/url&gt;
+&lt;/service&gt;
+</code></pre><h4><a id="Oozie+URL+Mapping"></a>Oozie URL Mapping</h4><p>For Oozie URLs, the mapping of Knox Gateway accessible URLs to direct Oozie URLs is simple.</p>
+<table>
+  <tbody>
+    <tr>
+      <td>Gateway </td>
+      <td><code>https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/oozie</code> </td>
+    </tr>
+    <tr>
+      <td>Cluster </td>
+      <td><code>http://{oozie-host}:{oozie-port}/oozie}</code> </td>
+    </tr>
+  </tbody>
+</table><h4><a id="Oozie+Request+Changes"></a>Oozie Request Changes</h4><p>TODO - In some cases the Oozie requests needs to be slightly different when made through the gateway. These changes are required in order to protect the client from knowing the internal structure of the Hadoop cluster.</p><h4><a id="Oozie+Example+via+Client+DSL"></a>Oozie Example via Client DSL</h4><p>This example will also submit the familiar WordCount Java MapReduce job to the Hadoop cluster via the gateway using the KnoxShell DSL. However in this case the job will be submitted via a Oozie workflow. There are several ways to do this depending upon your preference.</p><p>You can use the &ldquo;embedded&rdquo; Groovy interpreter provided with the distribution.</p>
+<pre><code>java -jar bin/shell.jar samples/ExampleOozieWorkflow.groovy
 </code></pre><p>You can manually type in the KnoxShell DSL script into the &ldquo;embedded&rdquo; Groovy interpreter provided with the distribution.</p>
 <pre><code>java -jar bin/shell.jar
-</code></pre><p>Each line from the file below will need to be typed or copied into the interactive shell.</p><h5><a id="samples/ExampleSubmitWorkflow.groovy"></a>samples/ExampleSubmitWorkflow.groovy</h5>
-<pre><code>import com.jayway.jsonpath.JsonPath
-import org.apache.hadoop.gateway.shell.Hadoop
-import org.apache.hadoop.gateway.shell.hdfs.Hdfs
-import org.apache.hadoop.gateway.shell.workflow.Workflow
-
-import static java.util.concurrent.TimeUnit.SECONDS
-
-gateway = &quot;https://localhost:8443/gateway/sandbox&quot;
-jobTracker = &quot;sandbox:50300&quot;;
-nameNode = &quot;sandbox:8020&quot;;
-username = &quot;guest&quot;
-password = &quot;guest-password&quot;
-inputFile = &quot;LICENSE&quot;
-jarFile = &quot;samples/hadoop-examples.jar&quot;
-
-definition = &quot;&quot;&quot;\
-&lt;workflow-app xmlns=&quot;uri:oozie:workflow:0.2&quot; name=&quot;wordcount-workflow&quot;&gt;
-    &lt;start to=&quot;root-node&quot;/&gt;
-    &lt;action name=&quot;root-node&quot;&gt;
-        &lt;java&gt;
-            &lt;job-tracker&gt;$jobTracker&lt;/job-tracker&gt;
-            &lt;name-node&gt;hdfs://$nameNode&lt;/name-node&gt;
-            &lt;main-class&gt;org.apache.hadoop.examples.WordCount&lt;/main-class&gt;
-            &lt;arg&gt;/tmp/test/input&lt;/arg&gt;
-            &lt;arg&gt;/tmp/test/output&lt;/arg&gt;
-        &lt;/java&gt;
-        &lt;ok to=&quot;end&quot;/&gt;
-        &lt;error to=&quot;fail&quot;/&gt;
-    &lt;/action&gt;
-    &lt;kill name=&quot;fail&quot;&gt;
-        &lt;message&gt;Java failed&lt;/message&gt;
-    &lt;/kill&gt;
-    &lt;end name=&quot;end&quot;/&gt;
-&lt;/workflow-app&gt;
-&quot;&quot;&quot;
-
-configuration = &quot;&quot;&quot;\
-&lt;configuration&gt;
-    &lt;property&gt;
-        &lt;name&gt;user.name&lt;/name&gt;
-        &lt;value&gt;$username&lt;/value&gt;
-    &lt;/property&gt;
-    &lt;property&gt;
-        &lt;name&gt;oozie.wf.application.path&lt;/name&gt;
-        &lt;value&gt;hdfs://$nameNode/tmp/test&lt;/value&gt;
-    &lt;/property&gt;
-&lt;/configuration&gt;
-&quot;&quot;&quot;
-
-hadoop = Hadoop.login( gateway, username, password )
-
-println &quot;Delete /tmp/test &quot; + Hdfs.rm(hadoop).file( &quot;/tmp/test&quot; ).recursive().now().statusCode
-println &quot;Mkdir /tmp/test &quot; + Hdfs.mkdir(hadoop).dir( &quot;/tmp/test&quot;).now().statusCode
-putWorkflow = Hdfs.put(hadoop).text( definition ).to( &quot;/tmp/test/workflow.xml&quot; ).later() {
-    println &quot;Put /tmp/test/workflow.xml &quot; + it.statusCode }
-putData = Hdfs.put(hadoop).file( inputFile ).to( &quot;/tmp/test/input/FILE&quot; ).later() {
-    println &quot;Put /tmp/test/input/FILE &quot; + it.statusCode }
-putJar = Hdfs.put(hadoop).file( jarFile ).to( &quot;/tmp/test/lib/hadoop-examples.jar&quot; ).later() {
-    println &quot;Put /tmp/test/lib/hadoop-examples.jar &quot; + it.statusCode }
-hadoop.waitFor( putWorkflow, putData, putJar )
-
-jobId = Workflow.submit(hadoop).text( configuration ).now().jobId
-println &quot;Submitted job &quot; + jobId
-
-status = &quot;UNKNOWN&quot;;
-count = 0;
-while( status != &quot;SUCCEEDED&quot; &amp;&amp; count++ &lt; 60 ) {
-  sleep( 1000 )
-  json = Workflow.status(hadoop).jobId( jobId ).now().string
-  status = JsonPath.read( json, &quot;${SDS}.status&quot; )
-}
-println &quot;Job status &quot; + status;
-
-println &quot;Shutdown &quot; + hadoop.shutdown( 10, SECONDS )
-
-exit
-</code></pre><h4><a id="Example+#3:+WebHDFS+&+Templeton/WebHCat+via+cURL"></a>Example #3: WebHDFS &amp; Templeton/WebHCat via cURL</h4><p>The example below illustrates the sequence of curl commands that could be used to run a &ldquo;word count&rdquo; map reduce job. It utilizes the hadoop-examples.jar from a Hadoop install for running a simple word count job. A copy of that jar has been included in the samples directory for convenience. Take care to follow the instructions below for steps 4/5 and 6/7 where the Location header returned by the call to the NameNode is copied for use with the call to the DataNode that follows it. These replacement values are identified with { } markup.</p>
+</code></pre><p>Each line from the file <code>samples/ExampleOozieWorkflow.groovy</code> will need to be typed or copied into the interactive shell.</p><h4><a id="Oozie+Example+via+cURL"></a>Oozie Example via cURL</h4><p>The example below illustrates the sequence of curl commands that could be used to run a &ldquo;word count&rdquo; map reduce job via an Oozie workflow.</p><p>It utilizes the hadoop-examples.jar from a Hadoop install for running a simple word count job. A copy of that jar has been included in the samples directory for convenience.</p><p>In addition a workflow definition and configuration file is required. These have not been included but are available for download. Download <a href="workflow-definition.xml">workflow-definition.xml</a> and <a href="workflow-configuration.xml">workflow-configuration.xml</a> and store them in the {GATEWAY_HOME} directory. Review the contents of workflow-configuration.xml to ensure that it matches your environment.</p><p>Take care to foll
 ow the instructions below where replacement values are required. These replacement values are identified with { } markup.</p>
 <pre><code># 0. Optionally cleanup the test directory in case a previous example was run without cleaning up.
 curl -i -k -u guest:guest-password -X DELETE \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test?op=DELETE&amp;recursive=true&#39;
-
-# 1. Create a test input directory /tmp/test/input
-curl -i -k -u guest:guest-password -X PUT \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test/input?op=MKDIRS&#39;
-
-# 2. Create a test output directory /tmp/test/input
-curl -i -k -u guest:guest-password -X PUT \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test/output?op=MKDIRS&#39;
-
-# 3. Create the inode for hadoop-examples.jar in /tmp/test
-curl -i -k -u guest:guest-password -X PUT \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test/hadoop-examples.jar?op=CREATE&#39;
-
-# 4. Upload hadoop-examples.jar to /tmp/test.  Use a hadoop-examples.jar from a Hadoop install.
-curl -i -k -u guest:guest-password -T samples/hadoop-examples.jar -X PUT &#39;{Value Location header from command above}&#39;
-
-# 5. Create the inode for a sample file README in /tmp/test/input
-curl -i -k -u guest:guest-password -X PUT \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test/input/README?op=CREATE&#39;
-
-# 6. Upload readme.txt to /tmp/test/input.  Use the readme.txt in {GATEWAY_HOME}.
-curl -i -k -u guest:guest-password -T README -X PUT &#39;{Value of Location header from command above}&#39;
-
-# 7. Submit the word count job via WebHCat/Templeton.
-# Take note of the Job ID in the JSON response as this will be used in the next step.
-curl -v -i -k -u guest:guest-password -X POST \
-    -d jar=/tmp/test/hadoop-examples.jar -d class=wordcount \
-    -d arg=/tmp/test/input -d arg=/tmp/test/output \
-    &#39;https://localhost:8443/gateway/sample/templeton/api/v1/mapreduce/jar&#39;
-
-# 8. Look at the status of the job
-curl -i -k -u guest:guest-password -X GET \
-    &#39;https://localhost:8443/gateway/sample/templeton/api/v1/queue/{Job ID returned in JSON body from previous step}&#39;
-
-# 9. Look at the status of the job queue
-curl -i -k -u guest:guest-password -X GET \
-    &#39;https://localhost:8443/gateway/sample/templeton/api/v1/queue&#39;
-
-# 10. List the contents of the output directory /tmp/test/output
-curl -i -k -u guest:guest-password -X GET \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test/output?op=LISTSTATUS&#39;
-
-# 11. Optionally cleanup the test directory
-curl -i -k -u guest:guest-password -X DELETE \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test?op=DELETE&amp;recursive=true&#39;
-</code></pre><h4><a id="Example+#4:+WebHDFS+&+Oozie+via+cURL"></a>Example #4: WebHDFS &amp; Oozie via cURL</h4><p>The example below illustrates the sequence of curl commands that could be used to run a &ldquo;word count&rdquo; map reduce job via an Oozie workflow. It utilizes the hadoop-examples.jar from a Hadoop install for running a simple word count job. A copy of that jar has been included in the samples directory for convenience. Take care to follow the instructions below where replacement values are required. These replacement values are identified with { } markup.</p>
-<pre><code># 0. Optionally cleanup the test directory in case a previous example was run without cleaning up.
-curl -i -k -u guest:guest-password -X DELETE \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test?op=DELETE&amp;recursive=true&#39;
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&amp;recursive=true&#39;
 
-# 1. Create the inode for workflow definition file in /tmp/test
+# 1. Create the inode for workflow definition file in /user/guest/example
 curl -i -k -u guest:guest-password -X PUT \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test/workflow.xml?op=CREATE&#39;
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/workflow.xml?op=CREATE&#39;
 
 # 2. Upload the workflow definition file.  This file can be found in {GATEWAY_HOME}/templates
-curl -i -k -u guest:guest-password -T templates/workflow-definition.xml -X PUT \
+curl -i -k -u guest:guest-password -T workflow-definition.xml -X PUT \
     &#39;{Value Location header from command above}&#39;
 
-# 3. Create the inode for hadoop-examples.jar in /tmp/test/lib
+# 3. Create the inode for hadoop-examples.jar in /user/guest/example/lib
 curl -i -k -u guest:guest-password -X PUT \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test/lib/hadoop-examples.jar?op=CREATE&#39;
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/lib/hadoop-examples.jar?op=CREATE&#39;
 
-# 4. Upload hadoop-examples.jar to /tmp/test/lib.  Use a hadoop-examples.jar from a Hadoop install.
+# 4. Upload hadoop-examples.jar to /user/guest/example/lib.  Use a hadoop-examples.jar from a Hadoop install.
 curl -i -k -u guest:guest-password -T samples/hadoop-examples.jar -X PUT \
     &#39;{Value Location header from command above}&#39;
 
-# 5. Create the inode for a sample input file readme.txt in /tmp/test/input.
+# 5. Create the inode for a sample input file readme.txt in /user/guest/example/input.
 curl -i -k -u guest:guest-password -X PUT \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test/input/README?op=CREATE&#39;
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/input/README?op=CREATE&#39;
 
-# 6. Upload readme.txt to /tmp/test/input.  Use the readme.txt in {GATEWAY_HOME}.
+# 6. Upload readme.txt to /user/guest/example/input.  Use the readme.txt in {GATEWAY_HOME}.
 # The sample below uses this README file found in {GATEWAY_HOME}.
 curl -i -k -u guest:guest-password -T README -X PUT \
     &#39;{Value of Location header from command above}&#39;
 
-# 7. Create the job configuration file by replacing the {NameNode host:port} and {JobTracker host:port}
-# in the command below to values that match your Hadoop configuration.
-# NOTE: The hostnames must be resolvable by the Oozie daemon.  The ports are the RPC ports not the HTTP ports.
-# For example {NameNode host:port} might be sandbox:8020 and {JobTracker host:port} sandbox:50300
-# The source workflow-configuration.xml file can be found in {GATEWAY_HOME}/templates
-# Alternatively, this file can copied and edited manually for environments without the sed utility.
-sed -e s/REPLACE.NAMENODE.RPCHOSTPORT/{NameNode host:port}/ \
-    -e s/REPLACE.JOBTRACKER.RPCHOSTPORT/{JobTracker host:port}/ \
-    &lt;templates/workflow-configuration.xml &gt;workflow-configuration.xml
-
-# 8. Submit the job via Oozie
+# 7. Submit the job via Oozie
 # Take note of the Job ID in the JSON response as this will be used in the next step.
-curl -i -k -u guest:guest-password -T workflow-configuration.xml -H Content-Type:application/xml -X POST \
-    &#39;https://localhost:8443/gateway/sample/oozie/api/v1/jobs?action=start&#39;
+curl -i -k -u guest:guest-password -H Content-Type:application/xml -T workflow-configuration.xml \
+    -X POST &#39;https://localhost:8443/gateway/sandbox/oozie/v1/jobs?action=start&#39;
 
-# 9. Query the job status via Oozie.
+# 8. Query the job status via Oozie.
 curl -i -k -u guest:guest-password -X GET \
-    &#39;https://localhost:8443/gateway/sample/oozie/api/v1/job/{Job ID returned in JSON body from previous step}&#39;
+    &#39;https://localhost:8443/gateway/sandbox/oozie/v1/job/{Job ID from JSON body}&#39;
 
-# 10. List the contents of the output directory /tmp/test/output
+# 9. List the contents of the output directory /user/guest/example/output
 curl -i -k -u guest:guest-password -X GET \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test/output?op=LISTSTATUS&#39;
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example/output?op=LISTSTATUS&#39;
 
-# 11. Optionally cleanup the test directory
+# 10. Optionally cleanup the test directory
 curl -i -k -u guest:guest-password -X DELETE \
-    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/tmp/test?op=DELETE&amp;recursive=true&#39;
-</code></pre><h3><a id="HBase"></a>HBase</h3><p>TODO</p><h4><a id="HBase+URL+Mapping"></a>HBase URL Mapping</h4><p>TODO</p><h4><a id="HBase+Examples"></a>HBase Examples</h4><p>TODO</p><p>The examples below illustrate the set of basic operations with HBase instance using Stargate REST API. Use following link to get more more details about HBase/Stargate API: <a href="http://wiki.apache.org/hadoop/Hbase/Stargate">http://wiki.apache.org/hadoop/Hbase/Stargate</a>.</p><h3><a id="HBase+Stargate+Setup"></a>HBase Stargate Setup</h3><h4><a id="Launch+Stargate"></a>Launch Stargate</h4><p>The command below launches the Stargate daemon on port 60080</p>
+    &#39;https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/example?op=DELETE&amp;recursive=true&#39;
+</code></pre><h3><a id="Oozie+Client+DSL"></a>Oozie Client DSL</h3><h4><a id="submit()+-+Submit+a+workflow+job."></a>submit() - Submit a workflow job.</h4>
+<ul>
+  <li>Request
+  <ul>
+    <li>text (String) - XML formatted workflow configuration string.</li>
+    <li>file (String) - A filename containing XML formatted workflow configuration.</li>
+    <li>action (String) - The initial action to take on the job. Optional: Default is &ldquo;start&rdquo;.</li>
+  </ul></li>
+  <li>Response
+  <ul>
+    <li>BasicResponse</li>
+  </ul></li>
+  <li>Example
+  <ul>
+    <li><code>Workflow.submit(session).file(localFile).action(&quot;start&quot;).now()</code></li>
+  </ul></li>
+</ul><h4><a id="status()+-+Query+the+status+of+a+workflow+job."></a>status() - Query the status of a workflow job.</h4>
+<ul>
+  <li>Request
+  <ul>
+    <li>jobId (String) - The job ID to check. This is the ID received when the job was created.</li>
+  </ul></li>
+  <li>Response
+  <ul>
+    <li>BasicResponse</li>
+  </ul></li>
+  <li>Example
+  <ul>
+    <li><code>Workflow.status(session).jobId(jobId).now().string</code></li>
+  </ul></li>
+</ul><h3><a id="HBase"></a>HBase</h3><p>TODO</p><h4><a id="HBase+URL+Mapping"></a>HBase URL Mapping</h4><p>TODO</p><h4><a id="HBase+Examples"></a>HBase Examples</h4><p>TODO</p><p>The examples below illustrate the set of basic operations with HBase instance using Stargate REST API. Use following link to get more more details about HBase/Stargate API: <a href="http://wiki.apache.org/hadoop/Hbase/Stargate">http://wiki.apache.org/hadoop/Hbase/Stargate</a>.</p><h3><a id="HBase+Stargate+Setup"></a>HBase Stargate Setup</h3><h4><a id="Launch+Stargate"></a>Launch Stargate</h4><p>The command below launches the Stargate daemon on port 60080</p>
 <pre><code>sudo /usr/lib/hbase/bin/hbase-daemon.sh start rest -p 60080
 </code></pre><p>60080 post is used because it was specified in sample Hadoop cluster deployment <code>{GATEWAY_HOME}/deployments/sandbox.xml</code>.</p><h4><a id="Configure+Sandbox+port+mapping+for+VirtualBox"></a>Configure Sandbox port mapping for VirtualBox</h4>
 <ol>
@@ -1767,22 +1608,21 @@ curl -i -k -u guest:guest-password -X DE
   <ul>
     <li>EmptyResponse</li>
   </ul></li>
-  <li>Example
-  <ul>
-    <li><code>HBase.session(session).table(tableName).create()
+  <li>Example</li>
+</ul>
+<pre><code>HBase.session(session).table(tableName).create()
    .attribute(&quot;tb_attr1&quot;, &quot;value1&quot;)
    .attribute(&quot;tb_attr2&quot;, &quot;value2&quot;)
    .family(&quot;family1&quot;)
-   .attribute(&quot;fm_attr1&quot;, &quot;value3&quot;)
-   .attribute(&quot;fm_attr2&quot;, &quot;value4&quot;)
+       .attribute(&quot;fm_attr1&quot;, &quot;value3&quot;)
+       .attribute(&quot;fm_attr2&quot;, &quot;value4&quot;)
    .endFamilyDef()
    .family(&quot;family2&quot;)
    .family(&quot;family3&quot;)
    .endFamilyDef()
    .attribute(&quot;tb_attr3&quot;, &quot;value5&quot;)
-   .now()</code></li>
-  </ul></li>
-</ul><h5><a id="table(String+tableName).update()+-+Update+Table+Schema."></a>table(String tableName).update() - Update Table Schema.</h5>
+   .now()
+</code></pre><h5><a id="table(String+tableName).update()+-+Update+Table+Schema."></a>table(String tableName).update() - Update Table Schema.</h5>
 <ul>
   <li>Request
   <ul>
@@ -1794,18 +1634,17 @@ curl -i -k -u guest:guest-password -X DE
   <ul>
     <li>EmptyResponse</li>
   </ul></li>
-  <li>Example
-  <ul>
-    <li><code>HBase.session(session).table(tableName).update()
- .family(&quot;family1&quot;)
-     .attribute(&quot;fm_attr1&quot;, &quot;new_value3&quot;)
- .endFamilyDef()
- .family(&quot;family4&quot;)
-     .attribute(&quot;fm_attr3&quot;, &quot;value6&quot;)
- .endFamilyDef()
- .now()</code></li>
-  </ul></li>
-</ul><h5><a id="table(String+tableName).regions()+-+Query+Table+Metadata."></a>table(String tableName).regions() - Query Table Metadata.</h5>
+  <li>Example</li>
+</ul>
+<pre><code>HBase.session(session).table(tableName).update()
+     .family(&quot;family1&quot;)
+         .attribute(&quot;fm_attr1&quot;, &quot;new_value3&quot;)
+     .endFamilyDef()
+     .family(&quot;family4&quot;)
+         .attribute(&quot;fm_attr3&quot;, &quot;value6&quot;)
+     .endFamilyDef()
+     .now()```
+</code></pre><h5><a id="table(String+tableName).regions()+-+Query+Table+Metadata."></a>table(String tableName).regions() - Query Table Metadata.</h5>
 <ul>
   <li>Request
   <ul>
@@ -1843,18 +1682,19 @@ curl -i -k -u guest:guest-password -X DE
   <ul>
     <li>EmptyResponse</li>
   </ul></li>
-  <li>Example
-  <ul>

[... 135 lines stripped ...]


Mime
View raw message