drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill-site git commit: doc updates for Drill 1.13
Date Wed, 14 Mar 2018 01:13:18 GMT
Repository: drill-site
Updated Branches:
  refs/heads/asf-site b3daf9a11 -> c5a1214d4


doc updates for Drill 1.13


Project: http://git-wip-us.apache.org/repos/asf/drill-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill-site/commit/c5a1214d
Tree: http://git-wip-us.apache.org/repos/asf/drill-site/tree/c5a1214d
Diff: http://git-wip-us.apache.org/repos/asf/drill-site/diff/c5a1214d

Branch: refs/heads/asf-site
Commit: c5a1214d4cf36ead0946d1966f87e5fd32e7ceac
Parents: b3daf9a
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Tue Mar 13 18:13:01 2018 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Tue Mar 13 18:13:01 2018 -0700

----------------------------------------------------------------------
 .../index.html                                  |  32 ++++-
 docs/configuring-drill-memory/index.html        |  49 ++++---
 .../index.html                                  | 133 ++++++++++---------
 docs/start-up-options/index.html                |   6 +-
 feed.xml                                        |   4 +-
 team/index.html                                 |   4 +
 6 files changed, 134 insertions(+), 94 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/docs/configuration-options-introduction/index.html
----------------------------------------------------------------------
diff --git a/docs/configuration-options-introduction/index.html b/docs/configuration-options-introduction/index.html
index 7b4e536..7fa513f 100644
--- a/docs/configuration-options-introduction/index.html
+++ b/docs/configuration-options-introduction/index.html
@@ -1153,7 +1153,7 @@
 
     </div>
 
-     Feb 5, 2018
+     Mar 14, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1172,13 +1172,13 @@
 
 <h2 id="system-options">System Options</h2>
 
-<p>The sys.options table lists ptions that you can set at the system or session level,
as described in the section, <a href="/docs/planning-and-execution-options">&quot;Planning
and Execution Options&quot;</a>.  </p>
+<p>The sys.options table lists options that you can set at the system or session level,
as described in the section, <a href="/docs/planning-and-execution-options">&quot;Planning
and Execution Options&quot;</a>.  </p>
 
 <table><thead>
 <tr>
-<th><strong>Name</strong></th>
-<th><strong>Default</strong></th>
-<th><strong>Description</strong></th>
+<th>Name</th>
+<th>Default</th>
+<th>Description</th>
 </tr>
 </thead><tbody>
 <tr>
@@ -1187,6 +1187,11 @@
 <td>Available as of Drill 1.10. Sets the   workspace for temporary tables. The workspace
must be writable, file-based,   and point to a location that already exists. This option requires
the   following format: .&lt;workspace</td>
 </tr>
 <tr>
+<td>drill.exec.memory.operator.output_batch_size</td>
+<td>16777216   (16 MB)</td>
+<td>Available as of Drill 1.13. Limits the   amount of memory that the Flatten, Merge
Join, and External Sort operators   allocate to outgoing batches.</td>
+</tr>
+<tr>
 <td>drill.exec.storage.implicit.filename.column.label</td>
 <td>filename</td>
 <td>Available as of Drill 1.10. Sets the   implicit column name for the filename column.</td>
@@ -1212,6 +1217,16 @@
 <td>In a text file, treat empty fields as NULL   values instead of empty string.</td>
 </tr>
 <tr>
+<td>drill.exe.spill.fs</td>
+<td>&quot;file:///&quot;</td>
+<td>Introduced   in Drill 1.11. The default file system on the local machine into which
the   Sort, Hash Aggregate, and Hash Join operators spill data.</td>
+</tr>
+<tr>
+<td>drill.exec.spill.directories</td>
+<td>[&quot;/tmp/drill/spill&quot;]</td>
+<td>Introduced   in Drill 1.11. The list of directories into which the Sort, Hash Aggregate,
  and Hash Join operators spill data. The list must be an array with   directories separated
by a comma, for example [&quot;/fs1/drill/spill&quot; ,   &quot;/fs2/drill/spill&quot;
, &quot;/fs3/drill/spill&quot;].</td>
+</tr>
+<tr>
 <td>drill.exec.storage.file.partition.column.label</td>
 <td>dir</td>
 <td>The column label for directory levels in   results of queries of files in a directory.
Accepts a string input.</td>
@@ -1239,7 +1254,7 @@
 <tr>
 <td>exec.java.compiler.exp_in_method_size</td>
 <td>50</td>
-<td>Introduced in Drill 1.8. For queries with complex or multiple expressions in the
query logic, this option   limits the number of expressions allowed in each method to prevent
Drill from   generating code that exceeds the Java limit of 64K bytes. If a method   approaches
the 64K limit, the Java compiler returns a message stating that   the code is too large to
compile. If queries return such a message, reduce   the value of this option at the session
level. The default value for this option is 50. The value is the count of   expressions allowed
in a method. Expressions are added to a method until they   hit the Java 64K limit, when a
new inner method is created and called from   the existing method.          <strong>Note:</strong>
This logic has not   been implemented for all operators. If a query uses operators for which
the   logic is not implemented, reducing the setting for this option may not   resolve the
error. Setting this option at the system level impacts all   queries 
 and can degrade query performance.</td>
+<td>Introduced in Drill 1.8. For queries with   complex or multiple expressions in
the query logic, this option limits the   number of expressions allowed in each method to
prevent Drill from generating   code that exceeds the Java limit of 64K bytes. If a method
approaches the 64K   limit, the Java compiler returns a message stating that the code is too
large   to compile. If queries return such a message, reduce the value of this option   at
the session level. The default value for this option is 50. The value is   the count of expressions
allowed in a method. Expressions are added to a   method until they hit the Java 64K limit,
when a new inner method is created   and called from the existing method. Note: This logic
has not been implemented for all operators. If   a query uses operators for which the logic
is not implemented, reducing the   setting for this option may not resolve the error. Setting
this option at the   system level impacts all queries and can degrade query perf
 ormance.</td>
 </tr>
 <tr>
 <td>exec.java_compiler_janino_maxsize</td>
@@ -1457,6 +1472,11 @@
 <td>Defines the maximum amount of direct memory   allocated to a query for planning.
When multiple queries run concurrently,   each query is allocated the amount of memory set
by this parameter.Increase   the value of this parameter and rerun the query if partition
pruning failed   due to insufficient memory.</td>
 </tr>
 <tr>
+<td>planner.memory.percent_per_query</td>
+<td>0.05</td>
+<td>Sets   the memory as a percentage of the total direct memory.</td>
+</tr>
+<tr>
 <td>planner.nestedloopjoin_factor</td>
 <td>100</td>
 <td>A heuristic value for influencing the nested   loop join.</td>

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/docs/configuring-drill-memory/index.html
----------------------------------------------------------------------
diff --git a/docs/configuring-drill-memory/index.html b/docs/configuring-drill-memory/index.html
index 69e54b7..1083aaf 100644
--- a/docs/configuring-drill-memory/index.html
+++ b/docs/configuring-drill-memory/index.html
@@ -1151,42 +1151,30 @@
 
     </div>
 
-     Jan 30, 2018
+     Mar 14, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
     <div class="int_text" align="left">
       
-        <p>You can configure the amount of direct memory allocated to a Drillbit for
query processing in any Drill cluster, multitenant or not. The default memory for a drillbit
is 8G, but Drill prefers 16G or more depending on the workload. The total amount of direct
memory that a drillbit allocates to query operations cannot exceed the limit set.</p>
+        <p>Drill uses Java direct memory. You can configure the amount of direct memory
allocated to a Drillbit for query processing. The default memory for a Drillbit is 8G, but
Drill prefers 16G or more depending on the workload. The total amount of direct memory that
a Drillbit allocates to query operations cannot exceed the limit set.</p>
 
-<p>Drill uses Java direct memory and performs well when executing operations in memory
instead of storing the operations on disk. Drill does not write to disk unless absolutely
necessary, unlike MapReduce where everything is written to disk during each phase of a job.</p>
+<p>Drill performs well when executing operations in memory instead of storing the operations
on disk. Drill does not write to disk unless absolutely necessary, unlike MapReduce where
everything is written to disk during each phase of a job.</p>
 
-<p>The JVM’s heap memory does not limit the amount of direct memory available in
-a drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 4), which should
-suffice because Drill avoids having data sit in heap memory.</p>
+<p>The JVM heap memory does not limit the amount of direct memory available in a Drillbit.
The on-heap memory for Drill is typically set at 4-8G (default is 4), which should
+suffice because Drill avoids having data sit in heap memory.  </p>
 
-<p>As of Drill 1.5, Drill uses a new allocator that improves an operator’s use of
direct memory and tracks the memory use more accurately. Due to this change, the sort operator
(in queries that ran successfully in previous releases) may not have enough memory, resulting
in a failed query and out of memory error instead of spilling to disk.     </p>
+<p>The following sections describe how to modify the memory allocated to each Drillbit
and queries:  </p>
 
-<h2 id="drillbit-memory">Drillbit Memory</h2>
+<h2 id="modifying-memory-allocated-to-a-drillbit">Modifying Memory Allocated to a Drillbit</h2>
 
-<p>The value set for the <a href="/docs/configuration-options-introduction/#system-options"><code>planner.memory.max_query_memory_per_node</code></a>
system option sets the maximum amount of direct memory allocated to the Sort and Hash Aggreate
operators in each query on a node. If a query plan contains multiple Sort and/or Hash Aggregate
operators, they all share this memory. The default limit is set to 2147483648 bytes (2GB),
which should be increased for queries on large data sets. If you encounter memory issues when
running queries with Sort and/or Hash Aggregate operators, increase the value of this option.
See <a href="https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based
and Hash-Based Memory Constrained Operators</a> for more information.  </p>
+<p>Modify the memory allocated to each Drillbit in a cluster in the Drillbit startup
script, <code>&lt;drill_installation_directory&gt;/conf/drill-env.sh</code>.
You must <a href="/docs/starting-drill-in-distributed-mode">restart Drill</a>
after you modify the script.</p>
 
-<p>If you continue to encounter memory issues after increasing this value, you can
also reduce the value of the <a href="/docs/configuration-options-introduction/"><code>planner.width.max_per_node</code></a>
option to reduce the level of parallelism per node. However, this may increase the amount
of time required for a query to complete. </p>
-
-<h3 id="modifying-drillbit-memory">Modifying Drillbit Memory</h3>
-
-<p>You can modify memory for each drillbit node in your cluster. To modify the memory
for a drillbit, set the DRILL_MAX_DIRECT_MEMORY variable in the drillbit startup script, <code>drill-env.sh</code>,
located in <code>&lt;drill_installation_directory&gt;/conf</code>, as
follows:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">export
DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-&quot;&lt;value&gt;&quot;}
-</code></pre></div>
 <div class="admonition note">
   <p class="first admonition-title">Note</p>
-  <p class="last">If DRILL_MAX_DIRECT_MEMORY is not set, the limit depends on the amount
of available system memory.  </p>
+  <p class="last">If DRILL_MAX_DIRECT_MEMORY is not set, the limit depends on the amount
of available direct memory.  </p>
 </div>
 
-<p>After you edit <code>&lt;drill_installation_directory&gt;/conf/drill-env.sh</code>,
<a href="/docs/starting-drill-in-distributed-mode">restart the drillbit</a> on
the node.</p>
-
-<h3 id="about-the-drillbit-startup-script">About the Drillbit Startup Script</h3>
-
 <p>The <code>drill-env.sh</code> file contains the following options:</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">#export
DRILL_HEAP=${DRILL_HEAP:-&quot;4G”}  
 #export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-&quot;8G&quot;}
@@ -1205,7 +1193,24 @@ DRILL_MAX_DIRECT_MEMORY is the Java direct memory limit per node. 
</p>
 <div class="highlight"><pre><code class="language-text" data-lang="text">export
DRILL_JAVA_OPTS=&quot;$DRILL_JAVA_OPTS -Ddrill.exec.memory.enable_unsafe_bounds_check=true&quot;
 
 </code></pre></div>
 <p>For earlier versions of Drill (prior to 1.13), bounds checking is enabled by default.
To disable bounds checking, set the <code>drill.enable_unsafe_memory_access</code>
parameter to true, as shown:  </p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">export
DRILL_JAVA_OPTS=&quot;$DRILL_JAVA_OPTS -Ddrill.enable_unsafe_memory_access=true&quot;
+<div class="highlight"><pre><code class="language-text" data-lang="text">export
DRILL_JAVA_OPTS=&quot;$DRILL_JAVA_OPTS -Ddrill.enable_unsafe_memory_access=true&quot;
 
+</code></pre></div>
+<h2 id="modifying-memory-allocated-to-queries">Modifying Memory Allocated to Queries</h2>
+
+<p>You can configure the amount of memory that Drill allocates to each query as a hard
limit or a percentage of the total direct memory. The <code>planner.memory.max_query_memory_per_node</code>
and <code>planner.memory.percent_per_query</code> options set the amount of memory
that Drill can allocate to a query on a node. Both options are enabled by default. Of these
two options, Drill picks the setting that provides the most memory. For more information about
these options, see <a href="https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based
and Hash-Based Memory Constrained Operators</a>.  </p>
+
+<p>If you modify the memory allocated per query and continue to experience out-of-memory
errors, you can try reducing the value of the <a href="/docs/configuration-options-introduction/"><code>planner.width.max_per_node</code></a>
option. Reducing the value of this option reduces the level of parallelism per node. However,
this may increase the amount of time required for a query to complete.  </p>
+
+<p>Another option you can modify is the <code>drill.exec.memory.operator.output_batch_size</code>
option, introduced in Drill 1.13. The  <code>drill.exec.memory.operator.output_batch_size</code>
option limits the amount of memory that the Flatten, Merge Join, and External Sort operators
allocate to outgoing batches. Limiting the memory allocated to outgoing batches can improve
concurrency and prevent queries from failing with out-of-memory errors.</p>
+
+<p>The average row size of the outgoing batch (calculated from the incoming batch size)
determines the number of rows that can fit into the available memory for the batch. If your
queries fail with memory errors, reduce the value of the <code>drill.exec.memory.operator.output_batch_size</code>
option to reduce the output batch size. </p>
+
+<p>The default value is 16777216 (16 MB). The maximum allowed value is 536870912 (512
MB). Enter the value in bytes. </p>
+
+<p><strong>Note:</strong> Configuring a batch size less than 1 MB is not
recommended, as it could lead to performance issues. </p>
+
+<p>Use the ALTER SYSTEM SET command to change the settings, as shown:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">
  ALTER SYSTEM SET `drill.exec.memory.operator.output_batch_size` = &lt;value&gt;;
 </code></pre></div>
     
       

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
----------------------------------------------------------------------
diff --git a/docs/sort-based-and-hash-based-memory-constrained-operators/index.html b/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
index f5743c5..a5df4c1 100644
--- a/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
+++ b/docs/sort-based-and-hash-based-memory-constrained-operators/index.html
@@ -1153,95 +1153,106 @@
 
     </div>
 
-     Aug 18, 2017
+     Mar 14, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
     <div class="int_text" align="left">
       
-        <p>Drill uses hash-based and sort-based operators depending on the query characteristics.
Hash Aggregate and Hash Join are hash-based operators. Sort, Streaming Aggregate, and Merge
Join are sort-based operators. Both hash-based and sort-based operations consume memory, however
the Hash Aggregate and Hash Join operators are the fastest and most memory intensive operators.
</p>
+        <p>Drill uses operators to sort, join, and aggregate data when executing queries.
Drill uses the Sort operator to sort data. Drill can use the Hash Aggregate or Hash Join operators
to aggregate data, or Drill can sort the data and then use the Merge Join or Streaming Aggregate
operators to aggregate the data. </p>
 
-<p>When planning a query with sort- and hash-based operations, Drill evaluates the
available memory multiplied by a configurable reduction constant (for parallelization purposes)
and then limits the operations to the maximum of this amount of memory. Drill spills data
to disk if the sort and hash aggregate operations cannot be performed in memory. Alternatively,
you can disable large hash operations if they do not fit in memory on your system. When disabled,
Drill creates alternative plans. You can also modify the minimum hash table size, increasing
the size for very large aggregations or joins when you have large amounts of memory for Drill
to use. If you have large data sets, you can increase the hash table size to improve performance.
</p>
+<p>The Hash operators typically perform better, however they are more memory intensive
than the Merge Join and Streaming Aggregate operators. The Sort operator may use as much or
even more memory than the Hash operators. If you want to see the difference in memory consumption
between the operators, you can run a query and view the query profile in the Drill Web Console.
Optionally, you can disable the Hash operators to force Drill to use the Merge Join and Streaming
Aggregate operators. </p>
 
-<h2 id="memory-options">Memory Options</h2>
+<p>When a query requires sorting, joining, and aggregation, Drill equally divides the
memory available among each instance of these memory intensive operators in a query. The number
of instances is equivalent to the number of these operators in the query plan, each multiplied
by its degree of parallelism. The degree of parallelism is the number of minor fragments required
to perform the work for each instance of an operator. When an instance of an operator must
process more data than it can hold, the operator temporarily spills some of the data to a
directory on disk to complete its work.  </p>
 
-<p>The <code>planner.memory.max_query_memory_per_node</code> option sets
the maximum amount of direct memory allocated to the Sort and Hash Aggregate operators during
each query on a node. The default limit is set to 2147483648 bytes (2GB), which should be
increased for queries on large data sets. This memory is split between operators. If a query
plan contains multiple Sort and/or Hash Aggregate operators, the memory is divided between
them.</p>
-
-<p>When a query is parallelized, the number of operators is multiplied, which reduces
the amount of memory given to each instance of the Sort and Hash Aggregate operators during
a query. If you encounter memory issues when running queries with Sort and Hash Aggregate
operators, calculate the memory requirements for your queries and the amount of available
memory on each node. Based on the information, increase the value of the <code>planner.memory.max_query_memory_per_node</code>
option using the ALTER SYSTEM|SESSION SET command, as shown:  </p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">ALTER
SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = &lt;new_value&gt;
 
-</code></pre></div>
-<p>The <code>planner.memory.enable_memory_estimation</code> option toggles
the state of memory estimation and re-planning of a query. When enabled, Drill conservatively
estimates memory requirements and typically excludes memory-constrained operators from the
query plan, which can negatively impact performance. The default setting is false. If you
want Drill to use very conservative memory estimates, use the ALTER SYSTEM|SESSION SET command
to change the setting, as shown:  </p>
-<div class="highlight"><pre><code class="language-text" data-lang="text">ALTER
SYSTEM|SESSION SET `planner.memory.enable_memory_estimation` = true  
-</code></pre></div>
 <h2 id="spill-to-disk">Spill to Disk</h2>
 
-<p>Spilling data to disk prevents queries that use memory-intensive Sort and Hash Aggregate
operations from failing with out-of-memory errors. Drill automatically writes excess data
to a temporary directory on disk when queries with Sort or Hash Aggregate operations exceed
the set memory limit on a Drill node. When the operators finish processing the in-memory data,
Drill reads the spilled data back from disk, and the operators finish processing the data.
When the operations complete, Drill removes the data from disk.  </p>
+<p>Spilling to disk prevents queries that use memory intensive operations from failing
with out-of-memory errors. The Spill to Disk feature enables the Sort, Hash Aggregate, and
Hash Join operators to automatically write excess data (as files) to a temporary directory
on disk when the memory requirements for the operators exceed the set memory limit. Queries
run uninterrupted while the operators perform the spill operations in the background.</p>
 
-<p>Spilling data to disk enables queries to run uninterrupted while Drill performs
the spill operations in the background. However, there can be performance impact due to the
time required to spill data and then read the data back from disk.  </p>
+<p>When the Sort, Hash Aggregate, and Hash Join operators finish processing the data
in memory, they read the spilled data back from disk and then finish processing the data.
The operators clean up their data (files) from the temporary spill location after they finish
processing the data. </p>
 
-<div class="admonition note">
-  <p class="first admonition-title">Note</p>
-  <p class="last">Drill 1.11 and later supports spilling to disk for the Hash Aggregate
operator in addition to the Sort operator. Previous releases of Drill only supported spilling
to disk for the Sort operator.  </p>
-</div>  
+<p>Ideally, you want to allocate enough memory for Drill to perform all operations
in memory. When data spills to disk, you will not see any difference in terms of how queries
run, however spilling to disk can impact performance due to the additional I/O required to
write data to disk and read the data back. See Memory Allocation (page 4) for more information.
</p>
 
-<h3 id="spill-locations">Spill Locations</h3>
+<p><strong>Note:</strong> Drill 1.13 and later supports spilling to disk
for the Hash Join, Hash Aggregate, and Sort operators. Drill 1.11 and 1.12 supports spilling
to disk for the Hash Aggregate and Sort operators. Releases of Drill prior to 1.11 only support
spilling to disk for the Sort operator.  </p>
 
-<p>Drill writes data to a temporary work area on disk. The default location of the
temporary work area is /tmp/drill/spill on the local file system. The /tmp/drill/spill directory
should suffice for small workloads or examples, however it is highly recommended that you
redirect the default spill location to a location with enough disk space to support spilling
for large workloads.  </p>
+<p><strong>Spill Locations</strong> </p>
 
-<div class="admonition note">
-  <p class="first admonition-title">Note</p>
-  <p class="last">Spilled data may require more space than the table referenced in
the query that is spilling the data. For example, if a table is 100 GB per node, the spill
directory should have the capacity to hold more than 100 GB.  </p>
-</div>
- 
+<p>The Sort, Hash Aggregate, and Hash Join operators write data to a temporary work
area on disk when they cannot process all of the data in memory. The default location of the
temporary work area is /tmp/drill/spill on the local file system. </p>
+
+<p>The /tmp/drill/spill directory should suffice for small workloads or examples, however
it is highly recommended that you redirect the default spill location to a location with enough
disk space to support spilling for large workloads.</p>
 
-<p>When you configure the spill location, you can specify a single directory, or a
list of directories into which the sort and hash aggregate operators both spill. Alternatively,
you can set specific spill directories for each type of operator, however this is not recommended
as these options will be deprecated in future releases of Drill. For more information, see
the Spill to Disk Configuration Options section below.  </p>
+<p><strong>Note:</strong> Spilled data may require more space than the
table referenced in the query that is spilling the data. For example, if a table is 100 GB
per node, the spill directory should have the capacity to hold more than 100 GB.</p>
 
-<h3 id="spill-to-disk-configuration-options">Spill to Disk Configuration Options</h3>
+<p>When you configure the spill location, you can specify a single directory or a list
of directories into which the Sort, Hash Aggregate, and Hash Join operators spill data. For
more information, see the Spill to Disk Configuration Options section below.  </p>
 
-<p>The options related to spilling reside in the drill-override.conf file on each Drill
node. An administrator or someone familiar with storage and disks should manage these settings.</p>
+<p><strong>Spill to Disk Configuration Options</strong>  </p>
 
-<div class="admonition note">
-  <p class="first admonition-title">Note</p>
-  <p class="last">You can see examples of these configuration options in the drill-override-example.conf
file located in the <drill_installation>/conf directory.  </p>
-</div> 
+<p>The drill-override.conf file, located in the /conf directory, contains options that
set the spill locations for the Hash and Sort operators. An administrator can change the file
system and directories into which the operators spill data. Refer to the drill-override-example.conf
file for examples. </p>
 
-<p>The following list describes the configuration options for spilling data to disk:
 </p>
+<p>The following list describes the spill to disk configuration options:  </p>
 
 <ul>
-<li><p><strong>drill.exe.spill.fs</strong><br>
-Introduced in Drill 1.11. The default file system on the local machine into which the Sort
and Hash Aggregate operators spill data. This is the recommended option to use for spilling.
You can configure this option so that data spills into a distributed file system, such as
hdfs. For example, &quot;hdfs:///&quot;. The default setting is &quot;file:///&quot;.
 </p></li>
-<li><p><strong>drill.exec.spill.directories</strong><br>
-Introduced in Drill 1.11. The list of directories into which the Sort and Hash Aggregate
operators spill data. The list must be an array with directories separated by a comma, for
example [&quot;/fs1/drill/spill&quot; , &quot;/fs2/drill/spill&quot; , &quot;/fs3/drill/spill&quot;].
This is the recommended option for spilling to multiple directories. The default setting is
[&quot;/tmp/drill/spill&quot;].  </p></li>
-<li><p><strong>drill.exec.sort.external.spill.fs</strong><br>
-Overrides the default location into which the Sort operator spills data. Instead of spilling
into the location set by the <code>drill.exec.spill.fs</code> option, the Sort
operators spill into the location specified by this option.<br>
-<strong>Note:</strong> As of Drill 1.11, this option is supported for backward
compatibility, however in future releases, this option will be deprecated. It is highly recommended
that you use the <code>drill.exec.spill.fs</code> option to set the spill location
instead. The default setting is &quot;file:///&quot;.  </p></li>
-<li><p><strong>drill.exec.sort.external.spill.directories</strong><br>
-Overrides the location into which the Sort operator spills data. Instead of spilling into
the location set by the <code>drill.exec.spill.directories</code> option, the
Sort operators spill into the directories specified by this option. The list must be an array
with directories separated by a comma, for example [&quot;/fs1/drill/spill&quot; ,
&quot;/fs2/drill/spill&quot; , &quot;/fs3/drill/spill&quot;].<br>
-<strong>Note:</strong> As of Drill 1.11, this option is supported for backward
compatibility, however in future releases, this option will be deprecated. It is highly recommended
that you use the <code>drill.exec.spill.directories</code> option to set the spill
location instead. The default setting is [&quot;/tmp/drill/spill&quot;].  </p></li>
-<li><p><strong>drill.exec.hashagg.spill.fs</strong><br>
-Overrides the location into which the Hash Aggregate operator spills data. Instead of spilling
into the location set by the <code>drill.exec.spill.fs</code> option, the Hash
Aggregate operator spills into the location specified by this option. Setting this option
to 1 disables spilling for the Hash Aggregate operator.<br>
-<strong>Note:</strong> As of Drill 1.11, this option is supported for backward
compatibility, however in future releases, this option will be deprecated. It is highly recommended
that you use the <code>drill.exec.spill.fs</code> option to set the spill location
instead. The default setting is &quot;file:///&quot;.  </p></li>
-<li><p><strong>drill.exec.hashagg.spill.directories</strong><br>
-Overrides the location into which the Hash Aggregate operator spills data. Instead of spilling
into the location set by the <code>drill.exec.spill.directories</code> option,
the Hash Aggregate operator spills to the directories specified by this option. The list must
be an array with directories separated by a comma, for example [&quot;/fs1/drill/spill&quot;
, &quot;/fs2/drill/spill&quot; , &quot;/fs3/drill/spill&quot;].<br>
-<strong>Note:</strong> As of Drill 1.11, this option is supported for backward
compatibility, however in future releases, this option will be deprecated. It is highly recommended
that you use the <code>drill.exec.spill.directories option</code> to set the spill
location instead.  </p></li>
+<li><strong>drill.exe.spill.fs</strong><br>
+Introduced in Drill 1.11. The default file system on the local machine into which the Sort,
Hash Aggregate, and Hash Join operators spill data. You can configure this option so that
data spills into a distributed file system, such as hdfs. For example, &quot;hdfs:///&quot;.
The default setting is &quot;file:///&quot;.</li>
+<li><strong>drill.exec.spill.directories</strong><br>
+Introduced in Drill 1.11. The list of directories into which the Sort, Hash Aggregate, and
Hash Join operators spill data. The list must be an array with directories separated by a
comma, for example [&quot;/fs1/drill/spill&quot; , &quot;/fs2/drill/spill&quot;
, &quot;/fs3/drill/spill&quot;]. The default setting is [&quot;/tmp/drill/spill&quot;].<br></li>
 </ul>
 
-<h2 id="hash-based-operator-configuration-settings">Hash-Based Operator Configuration
Settings</h2>
+<p><strong>Note:</strong> The following options were available prior to
Drill 1.11, but have since been deprecated and replaced with the options described above:
 </p>
+
+<ul>
+<li>Drill.exec.sort.external.spill.fs (Replaced by drill.exec.spill.fs)</li>
+<li>Drill.exec.sort.external.spill.directories (Replaced by drill.exec.spill.directories)</li>
+<li>Drill.exec.hashagg.spill.fs (Replaced by drill.exec.spill.fs)<br></li>
+</ul>
+
+<h2 id="memory-allocation">Memory Allocation</h2>
+
+<p>Drill evenly splits the available memory among all instances of the Sort, Hash Aggregate,
and Hash Join operators. When a query is parallelized, the number of operators is multiplied,
which reduces the amount of memory given to each instance of the operators during a query.
 </p>
+
+<p><strong>Memory Allocation Configuration Options</strong>  </p>
+
+<p>The <code>planner.memory.max_query_memory_per_node</code> and <code>planner.memory.percent_per_query</code>
options set the amount of memory that Drill can allocate to a query on a node. Both options
are enabled by default. Of these two options, Drill picks the setting that provides the most
memory.  </p>
+
+<ul>
+<li><strong>planner.memory.max_query_memory_per_node</strong><br>
+The <code>planner.memory.max_query_memory_per_node</code> option, set at 2 GB
by default, is the minimum amount of memory available to Drill per query on a node. The default
of 2 GB typically allows between two and three concurrent queries to run when the JVM is configured
to use 8 GB of direct memory (default). When the memory requirement for Drill increases, the
default of 2GB is constraining. You must increase the amount of memory for queries to complete,
unless the setting for the planner.memory.percent_per_query option allows for Drill to use
more memory.</li>
+<li><strong>planner.memory.percent_per_query</strong><br>
+Alternatively, the <code>planner.memory.percent_per_query</code> option sets
the memory as a percentage of the total direct memory. For example, if the allocation is set
to 10%, and the total direct memory is 128 GB, each query gets approximately 13 GB.<br></li>
+</ul>
+
+<p>The percentage is calculated using the following formula:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">
  (1 - non-managed allowance)/concurrency
+</code></pre></div>
+<p>The non-managed allowance is an assumed amount of system memory that non-managed
operators will use. Non-managed operators do not spill to disk. The default non-managed allowance
assumes 50% of the total system memory. And, the concurrency is the number of concurrent queries
that may run. The default assumption is 10.</p>
+
+<p>Based on the default assumptions, the default value of 5% is calculated as follows:
 </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">
  (1 - .50)/10 = 0.05  
+</code></pre></div>
+<p>This value is only used when throttling is disabled. Setting the value to 0 disables
the option. You can increase or decrease the value, however you should set the percentage
well below the JVM direct memory to account for the cases where Drill does not manage memory,
such as for the less memory intensive operators.  </p>
+
+<p><strong>Increasing the Available Memory</strong>  </p>
+
+<p>You can increase the amount of available memory to Drill using the ALTER SYSTEM|SESSION
SET commands with the <code>planner.memory.max_query_memory_per_node</code> or
<code>planner.memory.percent_per_query</code> options, as shown:  </p>
+<div class="highlight"><pre><code class="language-text" data-lang="text">
  ALTER SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = &lt;new_value&gt;
+   //The default value is to 2147483648 bytes (2GB). 
+
+   ALTER SYSTEM|SESSION SET `planner.memory.percent_per_query` = &lt;new_value&gt;
+   //The default value is 0.05.  
+</code></pre></div>
+<h2 id="disabling-the-hash-operators">Disabling the Hash Operators</h2>
+
+<p>You can disable the Hash Aggregate and Hash Join operators. When you disable these
operators, Drill creates alternative query plans that use the Sort operator and the Streaming
Aggregate or the Merge Join operator. </p>
 
-<p>Use the ALTER SYSTEM|SESSION SET commands with the options below to disable the
Hash Aggregate and Hash Join operators, modify the hash table size, or disable memory estimation.
Typically, you set the options at the session level unless you want the setting to persist
across all sessions.</p>
+<p>Use the ALTER SYSTEM|SESSION SET commands with the following options to disable
the Hash Aggregate and Hash Join operators. Typically, you set the options at the session
level unless you want the setting to persist across all sessions. </p>
 
-<p>The following options control the hash-based operators:</p>
+<p>The following options control the hash-based operators:  </p>
 
 <ul>
-<li><p><strong>planner.enable_hashagg</strong><br>
-Enables or disables hash aggregation; otherwise, Drill does a sort-based aggregation. This
option is enabled by default. The default, and recommended, setting is true. 
-The Hash Aggregate operator uses an uncontrolled amount of memory, up to 10 GB, after which
the operator runs out of memory. As of Drill 1.11, the Hash Aggregate operator can write to
disk. </p></li>
-<li><p><strong>planner.enable_hashjoin</strong><br>
-Enables or disables the memory hungry hash join. Drill assumes that a query will have adequate
memory to complete and tries to use the fastest operations possible to complete the planned
inner, left, right, or full outer joins using a hash table. The Hash Join operator uses an
uncontrolled amount of memory, up to 10 GB, after which the operator runs out of memory. Currently,
this operator does not write to disk. Disabling hash join allows Drill to manage arbitrarily
large data in a small memory footprint. This option is enabled by default. The default setting
is true.</p></li>
-<li><p><strong>exec.min_hash_table_size</strong><br>
-Starting size for hash tables. Increase this setting based on the memory available to improve
performance. The default setting for this option is 65536. The setting can range from 0 to
1073741824.</p></li>
-<li><p><strong>exec.max_hash_table_size</strong><br>
-Ending size for hash tables. The default setting for this option is 1073741824. The setting
can range from 0 to 1073741824.</p></li>
+<li><strong>planner.enable_hashagg</strong><br>
+Enables or disables hash aggregation; otherwise, Drill does a sort-based aggregation. This
option is enabled by default. The default, and recommended, setting is true. Prior to Drill
1.11, the Hash Aggregate operator used an uncontrolled amount of memory (up to 10 GB), after
which the operator ran out of memory. As of Drill 1.11, the Hash Aggregate operator can write
to disk.</li>
+<li><strong>planner.enable_hashjoin</strong><br>
+Enables or disables hash joins. This option is enabled by default. Drill assumes that a query
will have adequate memory to complete and tries to use the fastest operations possible Drill
1.11, the Hash Join operator used an uncontrolled amount of memory (up to 10 GB), after which
the operator ran out of memory. As of Drill 1.13, this operator can write to disk. This option
is enabled by default.</li>
 </ul>
 
     

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/docs/start-up-options/index.html
----------------------------------------------------------------------
diff --git a/docs/start-up-options/index.html b/docs/start-up-options/index.html
index 4115a7a..2f96249 100644
--- a/docs/start-up-options/index.html
+++ b/docs/start-up-options/index.html
@@ -1153,7 +1153,7 @@
 
     </div>
 
-     Aug 17, 2017
+     Mar 14, 2018
 
     <link href="/css/docpage.css" rel="stylesheet" type="text/css">
 
@@ -1206,9 +1206,9 @@ Defines the persistent storage (PStore) provider. The <a href="/docs/persistent-
 <li><p><strong>drill.exec.buffer.size</strong><br>
 Defines the amount of memory available, in terms of record batches, to hold data on the downstream
side of an operation. Drill pushes data downstream as quickly as possible to make data immediately
available. This requires Drill to use memory to hold the data pending operations. When data
on a downstream operation is required, that data is immediately available so Drill does not
have to go over the network to process it. Providing more memory to this option increases
the speed at which Drill completes a query.  </p></li>
 <li><p><strong>drill.exe.spill.fs</strong><br>
-Introduced in Drill 1.11. The default file system on the local machine into which the Sort
and Hash Aggregate operators spill data. This is the recommended option to use for spilling.
You can configure this option so that data spills into a distributed file system, such as
hdfs. For example, &quot;hdfs:///&quot;. The default setting is &quot;file:///&quot;.
See <a href="/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based
and Hash-Based Memory Constrained Operators</a> for more information.   </p></li>
+Introduced in Drill 1.11. The default file system on the local machine into which the Sort,
Hash Aggregate, and Hash Join operators spill data. This is the recommended option to use
for spilling. You can configure this option so that data spills into a distributed file system,
such as hdfs. For example, &quot;hdfs:///&quot;. The default setting is &quot;file:///&quot;.
See <a href="/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based
and Hash-Based Memory Constrained Operators</a> for more information.   </p></li>
 <li><p><strong>drill.exec.spill.directories</strong><br>
-Introduced in Drill 1.11. The list of directories into which the Sort and Hash Aggregate
operators spill data. The list must be an array with directories separated by a comma, for
example [&quot;/fs1/drill/spill&quot; , &quot;/fs2/drill/spill&quot; , &quot;/fs3/drill/spill&quot;].
This is the recommended option for spilling to multiple directories. The default setting is
[&quot;/tmp/drill/spill&quot;]. See <a href="/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based
and Hash-Based Memory Constrained Operators</a> for more information.  </p></li>
+Introduced in Drill 1.11. The list of directories into which the Sort, Hash Aggregate, and
Hash Join operators spill data. The list must be an array with directories separated by a
comma, for example [&quot;/fs1/drill/spill&quot; , &quot;/fs2/drill/spill&quot;
, &quot;/fs3/drill/spill&quot;]. This is the recommended option for spilling to multiple
directories. The default setting is [&quot;/tmp/drill/spill&quot;]. See <a href="/docs/sort-based-and-hash-based-memory-constrained-operators/">Sort-Based
and Hash-Based Memory Constrained Operators</a> for more information.  </p></li>
 <li><p><strong>drill.exec.zk.connect</strong><br>
 Provides Drill with the ZooKeeper quorum to use to connect to data sources. Change this setting
to point to the ZooKeeper quorum that you want Drill to use. You must configure this option
on each Drillbit node.  </p></li>
 <li><p><strong>drill.exec.profiles.store.inmemory</strong><br>

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index eee484b..611b6b4 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>/</link>
     <atom:link href="/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 21 Feb 2018 13:43:37 -0800</pubDate>
-    <lastBuildDate>Wed, 21 Feb 2018 13:43:37 -0800</lastBuildDate>
+    <pubDate>Tue, 13 Mar 2018 18:10:26 -0700</pubDate>
+    <lastBuildDate>Tue, 13 Mar 2018 18:10:26 -0700</lastBuildDate>
     <generator>Jekyll v2.5.2</generator>
     
       <item>

http://git-wip-us.apache.org/repos/asf/drill-site/blob/c5a1214d/team/index.html
----------------------------------------------------------------------
diff --git a/team/index.html b/team/index.html
index d51aa97..19cdaeb 100644
--- a/team/index.html
+++ b/team/index.html
@@ -257,6 +257,10 @@
 <td>Kamesh Bhallamudi</td>
 <td>kameshb</td>
 </tr>
+<tr>
+<td>Kunal Khatua</td>
+<td>kunal</td>
+</tr>
 </tbody></table>
 </div>
 


Mime
View raw message