drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill git commit: Additional doc updates related to hash agg spill to disk
Date Thu, 17 Aug 2017 21:22:10 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages 9e3290744 -> a8afaf19d


Additional doc updates related to hash agg spill to disk


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/a8afaf19
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/a8afaf19
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/a8afaf19

Branch: refs/heads/gh-pages
Commit: a8afaf19d77a22684a65903f2c5f034246a1579a
Parents: 9e32907
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Thu Aug 17 14:20:10 2017 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Thu Aug 17 14:20:10 2017 -0700

----------------------------------------------------------------------
 .../020-configuring-drill-memory.md             | 25 +++++----
 .../020-start-up-options.md                     | 42 +++++++++------
 ...d-hash-based-memory-constrained-operators.md | 55 ++++++++++----------
 3 files changed, 65 insertions(+), 57 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/a8afaf19/_docs/configure-drill/020-configuring-drill-memory.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/020-configuring-drill-memory.md b/_docs/configure-drill/020-configuring-drill-memory.md
index 4dad806..5949cab 100644
--- a/_docs/configure-drill/020-configuring-drill-memory.md
+++ b/_docs/configure-drill/020-configuring-drill-memory.md
@@ -1,37 +1,36 @@
 ---
 title: "Configuring Drill Memory"
-date: 2016-11-01 21:03:43 UTC
+date: 2017-08-17 21:20:15 UTC
 parent: "Configure Drill"
 ---
 
-You can configure the amount of direct memory allocated to a Drillbit for query processing
in any Drill cluster, multitenant or not. The default memory for a Drillbit is 8G, but Drill
prefers 16G or more depending on the workload. The total amount of direct memory that a Drillbit
allocates to query operations cannot exceed the limit set.
+You can configure the amount of direct memory allocated to a Drillbit for query processing
in any Drill cluster, multitenant or not. The default memory for a drillbit is 8G, but Drill
prefers 16G or more depending on the workload. The total amount of direct memory that a drillbit
allocates to query operations cannot exceed the limit set.
 
-Drill uses Java direct memory and performs well when executing
-operations in memory instead of storing the operations on disk. Drill does not
-write to disk unless absolutely necessary, unlike MapReduce where everything
-is written to disk during each phase of a job.
+Drill uses Java direct memory and performs well when executing operations in memory instead
of storing the operations on disk. Drill does not write to disk unless absolutely necessary,
unlike MapReduce where everything is written to disk during each phase of a job.
 
 The JVM’s heap memory does not limit the amount of direct memory available in
-a Drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 4), which should
+a drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 4), which should
 suffice because Drill avoids having data sit in heap memory.
 
-As of Drill 1.5, Drill uses a new allocator that improves an operator’s use of direct memory
and tracks the memory use more accurately. Due to this change, the sort operator (in queries
that ran successfully in previous releases) may not have enough memory, resulting in a failed
query and out of memory error instead of spilling to disk.
+As of Drill 1.5, Drill uses a new allocator that improves an operator’s use of direct memory
and tracks the memory use more accurately. Due to this change, the sort operator (in queries
that ran successfully in previous releases) may not have enough memory, resulting in a failed
query and out of memory error instead of spilling to disk.     
 
 
-The [`planner.memory.max_query_memory_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/#system-options)
system option value sets the maximum amount of direct memory allocated to the sort operator
in each query on a node. If a query plan contains multiple sort operators, they all share
this memory. If you encounter memory issues when running queries with sort operators, increase
the value of this option. If you continue to encounter memory issues after increasing this
value, you can also reduce the value of the [`planner.width.max_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/)
option to reduce the level of parallelism per node. However, this may increase the amount
of time required for a query to complete.  
+## Drillbit Memory  
+The value set for the [`planner.memory.max_query_memory_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/#system-options)
system option sets the maximum amount of direct memory allocated to the Sort and Hash Aggreate
operators in each query on a node. If a query plan contains multiple Sort and/or Hash Aggregate
operators, they all share this memory. If you encounter memory issues when running queries
with Sort and/or Hash Aggregate operators, increase the value of this option. See [Sort-Based
and Hash-Based Memory Constrained Operators](https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/)
for more information.  
 
+If you continue to encounter memory issues after increasing this value, you can also reduce
the value of the [`planner.width.max_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/)
option to reduce the level of parallelism per node. However, this may increase the amount
of time required for a query to complete. 
 
-## Modifying Drillbit Memory
+###Modifying Drillbit Memory
 
-You can modify memory for each Drillbit node in your cluster. To modify the memory for a
Drillbit, set the DRILL_MAX_DIRECT_MEMORY variable in the Drillbit startup script, `drill-env.sh`,
located in `<drill_installation_directory>/conf`, as follows:
+You can modify memory for each drillbit node in your cluster. To modify the memory for a
drillbit, set the DRILL_MAX_DIRECT_MEMORY variable in the drillbit startup script, `drill-env.sh`,
located in `<drill_installation_directory>/conf`, as follows:
 
     export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"<value>"}
 
 {% include startnote.html %}If DRILL_MAX_DIRECT_MEMORY is not set, the limit depends on the
amount of available system memory.{% include endnote.html %}
 
-After you edit `<drill_installation_directory>/conf/drill-env.sh`, [restart the Drillbit]({{
site.baseurl }}/docs/starting-drill-in-distributed-mode) on the node.
+After you edit `<drill_installation_directory>/conf/drill-env.sh`, [restart the drillbit]({{
site.baseurl }}/docs/starting-drill-in-distributed-mode) on the node.
 
-## About the Drillbit startup script
+## About the Drillbit Startup Script
 
 The `drill-env.sh` file contains the following options:
 

http://git-wip-us.apache.org/repos/asf/drill/blob/a8afaf19/_docs/configure-drill/configuration-options/020-start-up-options.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/configuration-options/020-start-up-options.md b/_docs/configure-drill/configuration-options/020-start-up-options.md
index 200bb9e..36400d8 100644
--- a/_docs/configure-drill/configuration-options/020-start-up-options.md
+++ b/_docs/configure-drill/configuration-options/020-start-up-options.md
@@ -1,15 +1,15 @@
 ---
 title: "Start-Up Options"
-date: 2017-08-08 02:22:56 UTC
+date: 2017-08-17 21:20:19 UTC
 parent: "Configuration Options"
 ---
-Drill’s start-up options reside in a [HOCON](https://github.com/typesafehub/config/blob/master/HOCON.md)
configuration file format, which is
-a hybrid between a properties file and a JSON file. Drill start-up options
-consist of a group of files with a nested relationship. At the bottom of the file hierarchy
are the default files that Drill provides, starting with `drill-default.conf`. The `drill-default.conf`
file is overridden by one or more `drill-module.conf` files that Drill’s internal modules
provide. The `drill-module.conf` files are overridden by the `drill-override.conf` file that
you define.    
+The start-up options for Drill reside in a [HOCON](https://github.com/typesafehub/config/blob/master/HOCON.md)
configuration file format, which is a hybrid between a properties file and a JSON file. Drill
start-up options consist of a group of files with a nested relationship. At the bottom of
the file hierarchy are the default files that Drill provides, starting with `drill-default.conf`.

 
-You can provide overrides on each Drillbit using system properties of the form `-Dname=value`
passed on the command line: 
+The `drill-default.conf` file is overridden by one or more `drill-module.conf` files that
Drill’s internal modules provide. The `drill-module.conf` files are overridden by the `drill-override.conf`
file that you define.    
+
+You can provide overrides on each drillbit using system properties of the form `-Dname=value`
passed on the command line: 
  
-       ./drillbit.sh start -Dname=value
+    ./drillbit.sh start -Dname=value
 
 
 You can see the following group of files throughout the source repository in
@@ -23,41 +23,49 @@ Drill:
 	exec/java-exec/src/main/resources/drill-module.conf
 	distribution/src/resources/drill-override.conf
 
-These files are listed inside the associated JAR files in the Drill
-distribution tarball.
+These files are listed inside the associated JAR files in the Drill distribution tarball.
 
 Each Drill module has a set of options that Drill incorporates. Drill’s
 modular design enables you to create new storage plugins, set new operators,
 or create UDFs. You can also include additional configuration options that you
-can override as necessary.
+can override as needed.
 
 When you add a JAR file to Drill, you must include a `drill-module.conf` file
 in the root directory of the JAR file that you add. The `drill-module.conf`
 file tells Drill to scan that JAR file or associated object and include it.
 
-## Viewing Startup Options
+## Viewing Start-Up Options
 
-You can run the following query to see a list of Drill’s startup options:
+Run the following query to see a list of the available start-up options:
 
     SELECT * FROM sys.boot;
 
 ## Configuring Start-Up Options
 
-You can configure start-up options for each Drillbit in `<drill_home>/conf/drill-override.conf`
.
+You can configure start-up options for each drillbit in `<drill_home>/conf/drill-override.conf`
.
 
 The summary of start-up options, also known as boot options, lists default values. The following
descriptions provide more detail on key options that are frequently reconfigured:
 
 * **drill.exec.http.ssl_enabled**  
   Available in Drill 1.2. Enables or disables [HTTPS support]({{site.baseurl}}/docs/configuring-web-console-and-rest-api-security/#https-support).
Settings are TRUE and FALSE, respectively. The default is FALSE.  
+  
 * **drill.exec.sys.store.provider.class**  
-  Defines the persistent storage (PStore) provider. The [PStore]({{ site.baseurl }}/docs/persistent-configuration-storage)
holds configuration and profile data.  
+  Defines the persistent storage (PStore) provider. The [PStore]({{site.baseurl}}/docs/persistent-configuration-storage)
holds configuration and profile data.  
+ 
 * **drill.exec.buffer.size**  
   Defines the amount of memory available, in terms of record batches, to hold data on the
downstream side of an operation. Drill pushes data downstream as quickly as possible to make
data immediately available. This requires Drill to use memory to hold the data pending operations.
When data on a downstream operation is required, that data is immediately available so Drill
does not have to go over the network to process it. Providing more memory to this option increases
the speed at which Drill completes a query.  
-* **drill.exec.sort.external.spill.directories**  
-  Tells Drill which directory to use when spooling. Drill uses a spool and sort operation
for beyond memory operations. The sorting operation is designed to spool to a Hadoop file
system. The default Hadoop file system is a local file system in the `/tmp` directory. Spooling
performance (both writing and reading back from it) is constrained by the file system.  
+  
+* **drill.exe.spill.fs**  
+Introduced in Drill 1.11. The default file system on the local machine into which the Sort
and Hash Aggregate operators spill data. This is the recommended option to use for spilling.
You can configure this option so that data spills into a distributed file system, such as
hdfs. For example, "hdfs:///". The default setting is "file:///". See [Sort-Based and Hash-Based
Memory Constrained Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/)
for more information.   
+  
+* **drill.exec.spill.directories**  
+Introduced in Drill 1.11. The list of directories into which the Sort and Hash Aggregate
operators spill data. The list must be an array with directories separated by a comma, for
example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"]. This is the recommended
option for spilling to multiple directories. The default setting is ["/tmp/drill/spill"].
See [Sort-Based and Hash-Based Memory Constrained Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/)
for more information.  
+
 * **drill.exec.zk.connect**  
   Provides Drill with the ZooKeeper quorum to use to connect to data sources. Change this
setting to point to the ZooKeeper quorum that you want Drill to use. You must configure this
option on each Drillbit node.  
+
 * **drill.exec.profiles.store.inmemory**  
-  Available as of Drill 1.11. When set to TRUE, enables Drill to store query profiles in
memory instead of writing the query profiles to disk. When set to FALSE, Drill writes the
profile for each query to disk, which is either the local file system or a distributed file
system, such as HDFS. For sub-second queries, writing the query profile to disk is expensive
due to the interactions with the file system. Enable this option if you want Drill to store
the profiles of sub-second queries in memory instead of writing them to disk. When you enable
this option, Drill stores the profiles in memory for as long as the drillbit runs. When the
drillbit restarts, the profiles no longer exist. You can set the maximum number of most recent
profiles to retain in memory through the drill.exec.profiles.store.capacity option. Settings
are TRUE and FALSE. Default is FALSE.  
+  Available as of Drill 1.11. When set to TRUE, enables Drill to store query profiles in
memory instead of writing the query profiles to disk. When set to FALSE, Drill writes the
profile for each query to disk, which is either the local file system or a distributed file
system, such as HDFS. For sub-second queries, writing the query profile to disk is expensive
due to the interactions with the file system. Enable this option if you want Drill to store
the profiles of sub-second queries in memory instead of writing them to disk. When you enable
this option, Drill stores the profiles in memory for as long as the drillbit runs. When the
drillbit restarts, the profiles no longer exist. You can set the maximum number of most recent
profiles to retain in memory through the `drill.exec.profiles.store.capacity` option. Settings
are TRUE and FALSE. Default is FALSE. See [Persistent Configuration Storage]({{site.baseurl}}/docs/persistent-configuration-storage/)
for more information.  
+ 
 * **drill.exec.profiles.store.capacity**  
-  Available as of Drill 1.11. Sets the maximum number of most recent profiles to retain in
memory when the drill.exec.profiles.store.inmemory option is enabled. Default is 1000.  
\ No newline at end of file
+  Available as of Drill 1.11. Sets the maximum number of most recent profiles to retain in
memory when the `drill.exec.profiles.store.inmemory` option is enabled. Default is 1000. 

\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/a8afaf19/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
index 3d17ae5..1c9de18 100644
--- a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
+++ b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
@@ -1,30 +1,30 @@
 ---
 title: "Sort-Based and Hash-Based Memory-Constrained Operators"
-date: 2017-08-17 04:49:34 UTC
+date: 2017-08-17 21:20:22 UTC
 parent: "Query Plans and Tuning"
 --- 
 
-Drill uses hash-based and sort-based operators depending on the query characteristics. Hash
aggregation and hash join are hash-based operations. Streaming aggregation and merge join
are sort-based operations. Both hash-based and sort-based operations consume memory, however
the hash aggregate and hash join operators are the fastest and most memory intensive operators.

+Drill uses hash-based and sort-based operators depending on the query characteristics. Hash
Aggregate and Hash Join are hash-based operators. Sort, Streaming Aggregate, and Merge Join
are sort-based operators. Both hash-based and sort-based operations consume memory, however
the Hash Aggregate and Hash Join operators are the fastest and most memory intensive operators.

 
-When planning a query with sort- and hash-based operators, Drill evaluates the available
memory multiplied by a configurable reduction constant (for parallelization purposes) and
then limits the operations to the maximum of this amount of memory. Drill spills data to disk
if the sort and hash aggregate operations cannot be performed in memory. Alternatively, you
can disable large hash operations if they do not fit in memory on your system. When disabled,
Drill creates alternative plans. You can also modify the minimum hash table size, increasing
the size for very large aggregations or joins when you have large amounts of memory for Drill
to use. If you have large data sets, you can increase the hash table size to improve performance.

+When planning a query with sort- and hash-based operations, Drill evaluates the available
memory multiplied by a configurable reduction constant (for parallelization purposes) and
then limits the operations to the maximum of this amount of memory. Drill spills data to disk
if the sort and hash aggregate operations cannot be performed in memory. Alternatively, you
can disable large hash operations if they do not fit in memory on your system. When disabled,
Drill creates alternative plans. You can also modify the minimum hash table size, increasing
the size for very large aggregations or joins when you have large amounts of memory for Drill
to use. If you have large data sets, you can increase the hash table size to improve performance.

 
 ##Memory Options
-The `planner.memory.max_query_memory_per_node` option sets the maximum amount of direct memory
allocated to the sort and hash aggregate operators during each query on a node. The default
limit is 2147483648 bytes (2GB), which is quite conservative. This memory is split between
operators. If a query plan contains multiple sort and/or hash aggregate operators, the memory
is divided between them.
+The `planner.memory.max_query_memory_per_node` option sets the maximum amount of direct memory
allocated to the Sort and Hash Aggregate operators during each query on a node. The default
limit is 2147483648 bytes (2GB), which is quite conservative. This memory is split between
operators. If a query plan contains multiple Sort and/or Hash Aggregate operators, the memory
is divided between them.
 
-When a query is parallelized, the number of operators is multiplied, which reduces the amount
of memory given to each instance of the sort and hash aggregate operators during a query.
If you encounter memory issues when running queries with sort and hash aggregate operators,
calculate the memory requirements for your queries and the amount of available memory on each
node. Based on the information, increase the value for the `planner.memory.max_query_memory_per_node`
option using the ALTER SYSTEM|SESSION SET command, as shown:  
+When a query is parallelized, the number of operators is multiplied, which reduces the amount
of memory given to each instance of the Sort and Hash Aggregate operators during a query.
If you encounter memory issues when running queries with Sort and Hash Aggregate operators,
calculate the memory requirements for your queries and the amount of available memory on each
node. Based on the information, increase the value of the `planner.memory.max_query_memory_per_node`
option using the ALTER SYSTEM|SESSION SET command, as shown:  
 
-    ALTER SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = 8147483648  
+    ALTER SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = <new_value>
 
   
 
-The `planner.memory.enable_memory_estimation` option toggles the state of memory estimation
and re-planning of the query. When enabled, Drill conservatively estimates memory requirements
and typically excludes memory-constrained operators from the query plan, which can negatively
impact performance. The default setting is false. If you want Drill to use very conservative
memory estimates, use the ALTER SYSTEM|SESSION SET command to change the setting, as shown:
 
+The `planner.memory.enable_memory_estimation` option toggles the state of memory estimation
and re-planning of a query. When enabled, Drill conservatively estimates memory requirements
and typically excludes memory-constrained operators from the query plan, which can negatively
impact performance. The default setting is false. If you want Drill to use very conservative
memory estimates, use the ALTER SYSTEM|SESSION SET command to change the setting, as shown:
 
 
     ALTER SYSTEM|SESSION SET `planner.memory.enable_memory_estimation` = true  
 
  
 ##Spill to Disk  
-The "Spill to Disk" feature prevents queries that use memory-intensive sort and hash aggregate
operations from failing with out-of-memory errors. Drill automatically writes excess data
to a temporary directory on disk when queries with sort or hash aggregate operations exceed
the set memory limit on a Drill node. When the operators finish processing the in-memory data,
Drill reads the spilled data back from disk, and the operators finish processing the data.
When the operations complete, Drill removes the data from disk.  
+Spilling data to disk prevents queries that use memory-intensive Sort and Hash Aggregate
operations from failing with out-of-memory errors. Drill automatically writes excess data
to a temporary directory on disk when queries with Sort or Hash Aggregate operations exceed
the set memory limit on a Drill node. When the operators finish processing the in-memory data,
Drill reads the spilled data back from disk, and the operators finish processing the data.
When the operations complete, Drill removes the data from disk.  
 
-Spilling to disk enables queries to run uninterrupted while Drill performs the spill operations
in the background. However, there can be performance impact due to the time required to spill
data and then read the data back from disk.  
+Spilling data to disk enables queries to run uninterrupted while Drill performs the spill
operations in the background. However, there can be performance impact due to the time required
to spill data and then read the data back from disk.  
 
 {% include startnote.html %}Drill 1.11 and later supports spilling to disk for the Hash Aggregate
operator in addition to the Sort operator. Previous releases of Drill only supported spilling
to disk for the Sort operator.{% include endnote.html %}  
 
@@ -36,51 +36,52 @@ Drill writes data to a temporary work area on disk. The default location
of the
 When you configure the spill location, you can specify a single directory, or a list of directories
into which the sort and hash aggregate operators both spill. Alternatively, you can set specific
spill directories for each type of operator, however this is not recommended as these options
will be deprecated in future releases of Drill. For more information, see the Spill to Disk
Configuration Options section below.  
 
 ###Spill to Disk Configuration Options  
-The spill to disk options reside in the drill-override.conf file on each Drill node. An administrator
or someone familiar with storage and disks should manage these settings.
+The options related to spilling reside in the drill-override.conf file on each Drill node.
An administrator or someone familiar with storage and disks should manage these settings.
 
 {% include startnote.html %}You can see examples of these configuration options in the drill-override-example.conf
file located in the <drill_installation>/conf directory.{% include endnote.html %} 
 
 The following list describes the configuration options for spilling data to disk:  
 
 * **drill.exe.spill.fs**  
-Introduced in Drill 1.11. The default file system on the local machine into which the sort
and hash aggregate operators spill data. This is the recommended option to use for spilling.
You can configure this option so that data spills into a distributed file system, such as
hdfs. For example, "hdfs:///". The default setting is "file:///".  
+Introduced in Drill 1.11. The default file system on the local machine into which the Sort
and Hash Aggregate operators spill data. This is the recommended option to use for spilling.
You can configure this option so that data spills into a distributed file system, such as
hdfs. For example, "hdfs:///". The default setting is "file:///".  
   
 * **drill.exec.spill.directories**  
-Introduced in Drill 1.11. The list of directories into which the sort and hash aggregate
operators spill data. The list must be an array with directories separated by a comma, for
example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"]. This is the recommended
option for spilling to multiple directories. The default setting is ["/tmp/drill/spill"].
 
+Introduced in Drill 1.11. The list of directories into which the Sort and Hash Aggregate
operators spill data. The list must be an array with directories separated by a comma, for
example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"]. This is the recommended
option for spilling to multiple directories. The default setting is ["/tmp/drill/spill"].
 
   
 * **drill.exec.sort.external.spill.fs**    
-Overrides the default location into which the sort operator spills data. Instead of spilling
into the location set by the drill.exec.spill.fs option, the sort operators spill into the
location specified by this option.  
-**Note:** As of Drill 1.11, this option is supported for backward compatibility, however
in future releases, this option will be deprecated. It is highly recommended that you   use
the drill.exec.spill.fs option to set the spill location instead. The default setting is "file:///".
+Overrides the default location into which the Sort operator spills data. Instead of spilling
into the location set by the `drill.exec.spill.fs` option, the Sort operators spill into the
location specified by this option.  
+**Note:** As of Drill 1.11, this option is supported for backward compatibility, however
in future releases, this option will be deprecated. It is highly recommended that you use
the `drill.exec.spill.fs` option to set the spill location instead. The default setting is
"file:///".  
+
 * **drill.exec.sort.external.spill.directories**   
-Overrides the location into which the sort operator spills data. Instead of spilling into
the location set by the drill.exec.spill.directories option, the sort operators spill into
the directories specified by this option. The list must be an array with directories separated
by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"].  
-**Note:** As of Drill 1.11, this option is supported for backward compatibility, however
in future releases, this option will be deprecated. It is highly recommended that you use
the drill.exec.spill.directories option to set the spill location instead. The default setting
is ["/tmp/drill/spill"].  
+Overrides the location into which the Sort operator spills data. Instead of spilling into
the location set by the `drill.exec.spill.directories` option, the Sort operators spill into
the directories specified by this option. The list must be an array with directories separated
by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"].  
+**Note:** As of Drill 1.11, this option is supported for backward compatibility, however
in future releases, this option will be deprecated. It is highly recommended that you use
the `drill.exec.spill.directories` option to set the spill location instead. The default setting
is ["/tmp/drill/spill"].  
  
 * **drill.exec.hashagg.spill.fs**  
-Overrides the location into which the hash aggregate operator spills data. Instead of spilling
into the location set by the drill.exec.spill.fs option, the hash aggregate operator spills
into the location specified by this option. Setting this option to 1 disables spilling for
the hash aggregate operator.  
-**Note:** As of Drill 1.11, this option is supported for backward compatibility, however
in future releases, this option will be deprecated. It is highly recommended that you use
the drill.exec.spill.fs option to set the spill location instead. The default setting is "file:///".
 
+Overrides the location into which the Hash Aggregate operator spills data. Instead of spilling
into the location set by the `drill.exec.spill.fs` option, the Hash Aggregate operator spills
into the location specified by this option. Setting this option to 1 disables spilling for
the Hash Aggregate operator.  
+**Note:** As of Drill 1.11, this option is supported for backward compatibility, however
in future releases, this option will be deprecated. It is highly recommended that you use
the `drill.exec.spill.fs` option to set the spill location instead. The default setting is
"file:///".  
   
 * **drill.exec.hashagg.spill.directories**    
-Overrides the location into which the hash aggregate operator spills data. Instead of spilling
into the location set by the drill.exec.spill.directories option, the hash aggregate operator
spills to the directories specified by this option. The list must be an array with directories
separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"].
 
-**Note:** As of Drill 1.11, this option is supported for backward compatibility, however
in future releases, this option will be deprecated. It is highly recommended that you use
the drill.exec.spill. directories option to set the spill location instead.  
+Overrides the location into which the Hash Aggregate operator spills data. Instead of spilling
into the location set by the `drill.exec.spill.directories` option, the Hash Aggregate operator
spills to the directories specified by this option. The list must be an array with directories
separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"].
 
+**Note:** As of Drill 1.11, this option is supported for backward compatibility, however
in future releases, this option will be deprecated. It is highly recommended that you use
the `drill.exec.spill.directories option` to set the spill location instead.  
 
 
-##Hash-Based Operator Settings
-Use the ALTER SYSTEM|SESSION SET commands with the options below to disable the hash aggregate
and hash join operators, modify the hash table size, disable memory estimation, or set the
estimated maximum amount of memory for a query. Typically, you set the options at the session
level unless you want the setting to persist across all sessions.
+##Hash-Based Operator Configuration Settings
+Use the ALTER SYSTEM|SESSION SET commands with the options below to disable the Hash Aggregate
and Hash Join operators, modify the hash table size, or disable memory estimation. Typically,
you set the options at the session level unless you want the setting to persist across all
sessions.
 
 The following options control the hash-based operators:
 
 * **planner.enable_hashagg**  
-    Enables or disables hash aggregation; otherwise, Drill does a sort-based aggregation.
This option is enabled by default.   The default setting is true, which is recommended.
+Enables or disables hash aggregation; otherwise, Drill does a sort-based aggregation. This
option is enabled by default. The default, and recommended, setting is true. 
+The Hash Aggregate operator uses an uncontrolled amount of memory, up to 10 GB, after which
the operator runs out of memory. As of Drill 1.11, the Hash Aggregate operator can write to
disk. 
 
 * **planner.enable_hashjoin**  
-    Enables or disables the memory hungry hash join. Drill assumes that a query will have
adequate memory to complete and tries to use the fastest operations possible to complete the
planned inner, left, right, or full outer joins using a hash table. Currently, this operator
does not write to disk. Disabling hash join allows Drill to manage arbitrarily large data
in a small memory footprint. This option is enabled by default. The default setting is true.
+Enables or disables the memory hungry hash join. Drill assumes that a query will have adequate
memory to complete and tries to use the fastest operations possible to complete the planned
inner, left, right, or full outer joins using a hash table. The Hash Join operator uses an
uncontrolled amount of memory, up to 10 GB, after which the operator runs out of memory. Currently,
this operator does not write to disk. Disabling hash join allows Drill to manage arbitrarily
large data in a small memory footprint. This option is enabled by default. The default setting
is true.
 
 * **exec.min_hash_table_size**  
-    Starting size for hash tables. Increase this setting based on the memory available to
improve performance.  
-    The default setting for this option is 65536. The setting can range from 0 to 1073741824.
+Starting size for hash tables. Increase this setting based on the memory available to improve
performance. The default setting for this option is 65536. The setting can range from 0 to
1073741824.
 
 * **exec.max\_hash\_table_size**  
-    Ending size for hash tables. The default setting for this option is 1073741824. The setting
can range from 0 to 1073741824.
+Ending size for hash tables. The default setting for this option is 1073741824. The setting
can range from 0 to 1073741824.
 
 
   


Mime
View raw message