drill-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bridg...@apache.org
Subject drill git commit: doc updates for Drill 1.13
Date Wed, 14 Mar 2018 01:00:41 GMT
Repository: drill
Updated Branches:
  refs/heads/gh-pages c533e56bf -> ccd89314c


doc updates for Drill 1.13


Project: http://git-wip-us.apache.org/repos/asf/drill/repo
Commit: http://git-wip-us.apache.org/repos/asf/drill/commit/ccd89314
Tree: http://git-wip-us.apache.org/repos/asf/drill/tree/ccd89314
Diff: http://git-wip-us.apache.org/repos/asf/drill/diff/ccd89314

Branch: refs/heads/gh-pages
Commit: ccd89314cc0d70894f44f4687f6a4e1ede1b2aec
Parents: c533e56
Author: Bridget Bevens <bbevens@maprtech.com>
Authored: Tue Mar 13 17:58:04 2018 -0700
Committer: Bridget Bevens <bbevens@maprtech.com>
Committed: Tue Mar 13 17:58:04 2018 -0700

----------------------------------------------------------------------
 .../020-configuring-drill-memory.md             |  53 ++++----
 .../010-configuration-options-introduction.md   |  12 +-
 .../020-start-up-options.md                     |   6 +-
 ...d-hash-based-memory-constrained-operators.md | 134 +++++++++++--------
 team.md                                         |   1 +
 5 files changed, 118 insertions(+), 88 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/_docs/configure-drill/020-configuring-drill-memory.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/020-configuring-drill-memory.md b/_docs/configure-drill/020-configuring-drill-memory.md
index e564e5b..f2c4122 100644
--- a/_docs/configure-drill/020-configuring-drill-memory.md
+++ b/_docs/configure-drill/020-configuring-drill-memory.md
@@ -1,36 +1,24 @@
 ---
 title: "Configuring Drill Memory"
-date: 2018-01-30 05:41:06 UTC
+date: 2018-03-14 00:58:05 UTC
 parent: "Configure Drill"
 ---
 
-You can configure the amount of direct memory allocated to a Drillbit for query processing in any Drill cluster, multitenant or not. The default memory for a drillbit is 8G, but Drill prefers 16G or more depending on the workload. The total amount of direct memory that a drillbit allocates to query operations cannot exceed the limit set.
+Drill uses Java direct memory. You can configure the amount of direct memory allocated to a Drillbit for query processing. The default memory for a Drillbit is 8G, but Drill prefers 16G or more depending on the workload. The total amount of direct memory that a Drillbit allocates to query operations cannot exceed the limit set.
 
-Drill uses Java direct memory and performs well when executing operations in memory instead of storing the operations on disk. Drill does not write to disk unless absolutely necessary, unlike MapReduce where everything is written to disk during each phase of a job.
+Drill performs well when executing operations in memory instead of storing the operations on disk. Drill does not write to disk unless absolutely necessary, unlike MapReduce where everything is written to disk during each phase of a job.
 
-The JVM’s heap memory does not limit the amount of direct memory available in
-a drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 4), which should
-suffice because Drill avoids having data sit in heap memory.
+The JVM heap memory does not limit the amount of direct memory available in a Drillbit. The on-heap memory for Drill is typically set at 4-8G (default is 4), which should
+suffice because Drill avoids having data sit in heap memory.  
 
-As of Drill 1.5, Drill uses a new allocator that improves an operator’s use of direct memory and tracks the memory use more accurately. Due to this change, the sort operator (in queries that ran successfully in previous releases) may not have enough memory, resulting in a failed query and out of memory error instead of spilling to disk.     
+The following sections describe how to modify the memory allocated to each Drillbit and queries:  
 
+## Modifying Memory Allocated to a Drillbit  
 
-## Drillbit Memory  
-The value set for the [`planner.memory.max_query_memory_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/#system-options) system option sets the maximum amount of direct memory allocated to the Sort and Hash Aggreate operators in each query on a node. If a query plan contains multiple Sort and/or Hash Aggregate operators, they all share this memory. The default limit is set to 2147483648 bytes (2GB), which should be increased for queries on large data sets. If you encounter memory issues when running queries with Sort and/or Hash Aggregate operators, increase the value of this option. See [Sort-Based and Hash-Based Memory Constrained Operators](https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/) for more information.  
+Modify the memory allocated to each Drillbit in a cluster in the Drillbit startup script, `<drill_installation_directory>/conf/drill-env.sh`. You must [restart Drill]({{ site.baseurl }}/docs/starting-drill-in-distributed-mode) after you modify the script.
 
-If you continue to encounter memory issues after increasing this value, you can also reduce the value of the [`planner.width.max_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/) option to reduce the level of parallelism per node. However, this may increase the amount of time required for a query to complete. 
+{% include startnote.html %}If DRILL_MAX_DIRECT_MEMORY is not set, the limit depends on the amount of available direct memory.{% include endnote.html %}
 
-###Modifying Drillbit Memory
-
-You can modify memory for each drillbit node in your cluster. To modify the memory for a drillbit, set the DRILL_MAX_DIRECT_MEMORY variable in the drillbit startup script, `drill-env.sh`, located in `<drill_installation_directory>/conf`, as follows:
-
-    export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"<value>"}
-
-{% include startnote.html %}If DRILL_MAX_DIRECT_MEMORY is not set, the limit depends on the amount of available system memory.{% include endnote.html %}
-
-After you edit `<drill_installation_directory>/conf/drill-env.sh`, [restart the drillbit]({{ site.baseurl }}/docs/starting-drill-in-distributed-mode) on the node.
-
-### About the Drillbit Startup Script
 
 The `drill-env.sh` file contains the following options:
 
@@ -57,8 +45,25 @@ As of Drill 1.13, bounds checking for direct memory is disabled by default. To e
 For earlier versions of Drill (prior to 1.13), bounds checking is enabled by default. To disable bounds checking, set the `drill.enable_unsafe_memory_access` parameter to true, as shown:  
 
 
-    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Ddrill.enable_unsafe_memory_access=true"
-  
-  
+    export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS -Ddrill.enable_unsafe_memory_access=true"  
+
+
+##Modifying Memory Allocated to Queries  
+
+You can configure the amount of memory that Drill allocates to each query as a hard limit or a percentage of the total direct memory. The `planner.memory.max_query_memory_per_node` and `planner.memory.percent_per_query` options set the amount of memory that Drill can allocate to a query on a node. Both options are enabled by default. Of these two options, Drill picks the setting that provides the most memory. For more information about these options, see [Sort-Based and Hash-Based Memory Constrained Operators](https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/).  
+
+
+If you modify the memory allocated per query and continue to experience out-of-memory errors, you can try reducing the value of the [`planner.width.max_per_node`]({{site.baseurl}}/docs/configuration-options-introduction/) option. Reducing the value of this option reduces the level of parallelism per node. However, this may increase the amount of time required for a query to complete.  
+
+Another option you can modify is the `drill.exec.memory.operator.output_batch_size` option, introduced in Drill 1.13. The  `drill.exec.memory.operator.output_batch_size` option limits the amount of memory that the Flatten, Merge Join, and External Sort operators allocate to outgoing batches. Limiting the memory allocated to outgoing batches can improve concurrency and prevent queries from failing with out-of-memory errors.
+ 
+The average row size of the outgoing batch (calculated from the incoming batch size) determines the number of rows that can fit into the available memory for the batch. If your queries fail with memory errors, reduce the value of the `drill.exec.memory.operator.output_batch_size` option to reduce the output batch size. 
+
+The default value is 16777216 (16 MB). The maximum allowed value is 536870912 (512 MB). Enter the value in bytes. 
+
+**Note:** Configuring a batch size less than 1 MB is not recommended, as it could lead to performance issues. 
 
+Use the ALTER SYSTEM SET command to change the settings, as shown:  
 
+       ALTER SYSTEM SET `drill.exec.memory.operator.output_batch_size` = <value>;
+  
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
index 9f28eeb..08352e0 100644
--- a/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
+++ b/_docs/configure-drill/configuration-options/010-configuration-options-introduction.md
@@ -1,6 +1,6 @@
 ---
 title: "Configuration Options Introduction"
-date: 2018-02-05 23:56:13 UTC
+date: 2018-03-14 00:58:05 UTC
 parent: "Configuration Options"
 ---
 
@@ -13,22 +13,25 @@ See [Configuration and Launch Script Changes]({{site.baseurl}}/docs/apache-drill
 The sys.options table contains information about system and session options. The sys.boot table contains information about Drill start-up options. The section, ["Start-up Options"]({{site.baseurl}}/docs/start-up-options), covers how to configure and view key boot options. The following table lists the system options in alphabetical order and provides a brief description of supported options.
 
 ## System Options
-The sys.options table lists ptions that you can set at the system or session level, as described in the section, ["Planning and Execution Options"]({{site.baseurl}}/docs/planning-and-execution-options).  
+The sys.options table lists options that you can set at the system or session level, as described in the section, ["Planning and Execution Options"]({{site.baseurl}}/docs/planning-and-execution-options).  
 
-| **Name**                                              | **Default**                                           | **Description
                                                                                                                                                                                                                                       |
+| Name                                              | Default                                           | Description
                                                                                                                                                                                                                           |
 |---------------------------------------------------|---------------------------------------------------|
 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | drill.exec.default_temporary_workspace            | dfs.tmp                                           | Available as of Drill 1.10. Sets the   workspace for temporary tables. The workspace must be writable, file-based,   and point to a location that already exists. This option requires the   following format: .<workspace
                                                                                                                                                                                                                           |
+| drill.exec.memory.operator.output_batch_size      | 16777216   (16 MB)                                |       Available as of Drill 1.13. Limits the   amount of memory that the Flatten, Merge Join, and External Sort operators   allocate to outgoing batches
                                                                                                                                                                                                                           |
 | drill.exec.storage.implicit.filename.column.label | filename                                          | Available as of Drill 1.10. Sets the   implicit column name for the filename column
                                                                                                                                                                                                                           |
 | drill.exec.storage.implicit.filepath.column.label | filepath                                          | Available as of Drill 1.10. Sets the   implicit column name for the filepath column
                                                                                                                                                                                                                           |
 | drill.exec.storage.implicit.fqn.column.label      | fqn                                               | Available as of Drill 1.10. Sets the   implicit column name for the fqn column.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                           |
 | drill.exec.storage.implicit.suffix.column.label   | suffix                                            | Available as of Drill 1.10. Sets the   implicit column name for the suffix column
                                                                                                                                                                                                                           |
 | drill.exec.functions.cast_empty_string_to_null    | FALSE                                             | In a text file, treat empty fields as NULL   values instead of empty string
                                                                                                                                                                                                                           |
+| drill.exe.spill.fs                                |  "file:///"                                       | Introduced   in Drill 1.11. The default file system on the local machine into which the   Sort, Hash Aggregate, and Hash Join operators spill data
                                                                                                                                                                                                                           |
+| drill.exec.spill.directories                      | ["/tmp/drill/spill"]                              | Introduced   in Drill 1.11. The list of directories into which the Sort, Hash Aggregate,   and Hash Join operators spill data. The list must be an array with   directories separated by a comma, for example ["/fs1/drill/spill" ,   "/fs2/drill/spill" , "/fs3/drill/spill
                                                                                                                                                                                                                           |
 | drill.exec.storage.file.partition.column.label    | dir                                               | The column label for directory levels in   results of queries of files in a directory. Accepts a string input
                                                                                                                                                                                                                           |
 | exec.enable_union_type                            | FALSE                                             | Enable support for Avro union type
                                                                                                                                                                                                                           |
 | exec.errors.verbose                               | FALSE                                             | Toggles verbose output of executable error   messages
                                                                                                                                                                                                                           |
 | exec.java_compiler                                | DEFAULT                                           | Switches between DEFAULT, JDK, and JANINO   mode for the current session. Uses Janino by default for generated source   code of less than exec.java_compiler_janino_maxsize; otherwise, switches to   the JDK compiler
                                                                                                                                                                                                                           |
 | exec.java_compiler_debug                          | TRUE                                              | Toggles the output of debug-level compiler   error messages in runtime generated code.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                           |
-| exec.java.compiler.exp_in_method_size             | 50                                                | Introduced in Drill 1.8. For queries with complex or multiple expressions in the query logic, this option   limits the number of expressions allowed in each method to prevent Drill from   generating code that exceeds the Java limit of 64K bytes. If a method   approaches the 64K limit, the Java compiler returns a message stating that   the code is too large to compile. If queries return such a message, reduce   the value of this option at the session level. The default value for this option is 50. The value is the count of   expressions allowed in a method. Expressions are added to a method until they   hit the Java 64K limit, when a new inner method is created and called from   the existing method.          **Note:** This logic has not   been implemented for all operators. If a query uses operators for which the   logic is not implemented, reducing the setting for this option ma
 y not   resolve the error. Setting this option at the system level impacts all   queries and can degrade query performance.                                        |
+| exec.java.compiler.exp_in_method_size             | 50                                                | Introduced in Drill 1.8. For queries with   complex or multiple expressions in the query logic, this option limits the   number of expressions allowed in each method to prevent Drill from generating   code that exceeds the Java limit of 64K bytes. If a method approaches the 64K   limit, the Java compiler returns a message stating that the code is too large   to compile. If queries return such a message, reduce the value of this option   at the session level. The default value for this option is 50. The value is   the count of expressions allowed in a method. Expressions are added to a   method until they hit the Java 64K limit, when a new inner method is created   and called from the existing method. Note: This logic has not been implemented for all operators. If   a query uses operators for which the logic is not implemented, reducing the   setting for this option may not resol
 ve the error. Setting this option at the   system level impacts all queries and can degrade query performance.                                                                                                            |
 | exec.java_compiler_janino_maxsize                 | 262144                                            | See the exec.java_compiler option comment.   Accepts inputs of type
                                                                                                                                                                                                                           |
 | exec.max_hash_table_size                          | 1073741824                                        | Ending size in buckets for hash tables.   Range: 0 - 1073741824.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                           |
 | exec.min_hash_table_size                          | 65536                                             | Starting size in bucketsfor hash tables.   Increase according to available memory to improve performance. Increasing for   very large aggregations or joins when you have large amounts of memory for   Drill to use. Range: 0 - 1073741824.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                                                                                                                                                                                                                           |
@@ -72,6 +75,7 @@ The sys.options table lists ptions that you can set at the system or session lev
 | planner.memory.max_query_memory_per_node          | 2147483648 bytes                                  | Sets the maximum amount of direct memory   allocated to the Sort and Hash Aggregate operators during each query on a   node. This memory is split between operators. If a query plan contains   multiple Sort and/or Hash Aggregate operators, the memory is divided between   them. The default limit should be increased for queries on large data sets
                                                                                                                                                                                                                           |
 | planner.memory.non_blocking_operators_memory      | 64                                                | Extra query memory per node for non-blocking   operators. This option is currently used only for memory estimation. Range
                                                                                                                                                                                                                           |
 | planner.memory_limit                              | 268435456 bytes                                   | Defines the maximum amount of direct memory   allocated to a query for planning. When multiple queries run concurrently,   each query is allocated the amount of memory set by this parameter.Increase   the value of this parameter and rerun the query if partition pruning failed   due to insufficient memory
                                                                                                                                                                                                                           |
+| planner.memory.percent_per_query                  | 0.05                                              | Sets   the memory as a percentage of the total direct memory
                                                                                                                                                                                                                           |
 | planner.nestedloopjoin_factor                     | 100                                               | A heuristic value for influencing the nested   loop join
                                                                                                                                                                                                                           |
 | planner.partitioner_sender_max_threads            | 8                                                 | Upper limit of threads for outbound queuing
                                                                                                                                                                                                                           |
 | planner.partitioner_sender_set_threads            | -1                                                | Overwrites the number of threads used to   send out batches of records. Set to -1 to disable. Typically not changed
                                                                                                                                                                                                                           |

http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/_docs/configure-drill/configuration-options/020-start-up-options.md
----------------------------------------------------------------------
diff --git a/_docs/configure-drill/configuration-options/020-start-up-options.md b/_docs/configure-drill/configuration-options/020-start-up-options.md
index 36400d8..9ab2e4b 100644
--- a/_docs/configure-drill/configuration-options/020-start-up-options.md
+++ b/_docs/configure-drill/configuration-options/020-start-up-options.md
@@ -1,6 +1,6 @@
 ---
 title: "Start-Up Options"
-date: 2017-08-17 21:20:19 UTC
+date: 2018-03-14 00:58:06 UTC
 parent: "Configuration Options"
 ---
 The start-up options for Drill reside in a [HOCON](https://github.com/typesafehub/config/blob/master/HOCON.md) configuration file format, which is a hybrid between a properties file and a JSON file. Drill start-up options consist of a group of files with a nested relationship. At the bottom of the file hierarchy are the default files that Drill provides, starting with `drill-default.conf`. 
@@ -56,10 +56,10 @@ The summary of start-up options, also known as boot options, lists default value
   Defines the amount of memory available, in terms of record batches, to hold data on the downstream side of an operation. Drill pushes data downstream as quickly as possible to make data immediately available. This requires Drill to use memory to hold the data pending operations. When data on a downstream operation is required, that data is immediately available so Drill does not have to go over the network to process it. Providing more memory to this option increases the speed at which Drill completes a query.  
   
 * **drill.exe.spill.fs**  
-Introduced in Drill 1.11. The default file system on the local machine into which the Sort and Hash Aggregate operators spill data. This is the recommended option to use for spilling. You can configure this option so that data spills into a distributed file system, such as hdfs. For example, "hdfs:///". The default setting is "file:///". See [Sort-Based and Hash-Based Memory Constrained Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/) for more information.   
+Introduced in Drill 1.11. The default file system on the local machine into which the Sort, Hash Aggregate, and Hash Join operators spill data. This is the recommended option to use for spilling. You can configure this option so that data spills into a distributed file system, such as hdfs. For example, "hdfs:///". The default setting is "file:///". See [Sort-Based and Hash-Based Memory Constrained Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/) for more information.   
   
 * **drill.exec.spill.directories**  
-Introduced in Drill 1.11. The list of directories into which the Sort and Hash Aggregate operators spill data. The list must be an array with directories separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"]. This is the recommended option for spilling to multiple directories. The default setting is ["/tmp/drill/spill"]. See [Sort-Based and Hash-Based Memory Constrained Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/) for more information.  
+Introduced in Drill 1.11. The list of directories into which the Sort, Hash Aggregate, and Hash Join operators spill data. The list must be an array with directories separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"]. This is the recommended option for spilling to multiple directories. The default setting is ["/tmp/drill/spill"]. See [Sort-Based and Hash-Based Memory Constrained Operators]({{site.baseurl}}/docs/sort-based-and-hash-based-memory-constrained-operators/) for more information.  
 
 * **drill.exec.zk.connect**  
   Provides Drill with the ZooKeeper quorum to use to connect to data sources. Change this setting to point to the ZooKeeper quorum that you want Drill to use. You must configure this option on each Drillbit node.  

http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
----------------------------------------------------------------------
diff --git a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
index 999e026..315150c 100644
--- a/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
+++ b/_docs/performance-tuning/query-plans-and-tuning/050-sort-based-and-hash-based-memory-constrained-operators.md
@@ -1,91 +1,111 @@
 ---
 title: "Sort-Based and Hash-Based Memory-Constrained Operators"
-date: 2017-08-18 17:48:11 UTC
+date: 2018-03-14 00:58:06 UTC
 parent: "Query Plans and Tuning"
 --- 
 
-Drill uses hash-based and sort-based operators depending on the query characteristics. Hash Aggregate and Hash Join are hash-based operators. Sort, Streaming Aggregate, and Merge Join are sort-based operators. Both hash-based and sort-based operations consume memory, however the Hash Aggregate and Hash Join operators are the fastest and most memory intensive operators. 
+Drill uses operators to sort, join, and aggregate data when executing queries. Drill uses the Sort operator to sort data. Drill can use the Hash Aggregate or Hash Join operators to aggregate data, or Drill can sort the data and then use the Merge Join or Streaming Aggregate operators to aggregate the data. 
 
-When planning a query with sort- and hash-based operations, Drill evaluates the available memory multiplied by a configurable reduction constant (for parallelization purposes) and then limits the operations to the maximum of this amount of memory. Drill spills data to disk if the sort and hash aggregate operations cannot be performed in memory. Alternatively, you can disable large hash operations if they do not fit in memory on your system. When disabled, Drill creates alternative plans. You can also modify the minimum hash table size, increasing the size for very large aggregations or joins when you have large amounts of memory for Drill to use. If you have large data sets, you can increase the hash table size to improve performance. 
+The Hash operators typically perform better, however they are more memory intensive than the Merge Join and Streaming Aggregate operators. The Sort operator may use as much or even more memory than the Hash operators. If you want to see the difference in memory consumption between the operators, you can run a query and view the query profile in the Drill Web Console. Optionally, you can disable the Hash operators to force Drill to use the Merge Join and Streaming Aggregate operators. 
 
-##Memory Options
-The `planner.memory.max_query_memory_per_node` option sets the maximum amount of direct memory allocated to the Sort and Hash Aggregate operators during each query on a node. The default limit is set to 2147483648 bytes (2GB), which should be increased for queries on large data sets. This memory is split between operators. If a query plan contains multiple Sort and/or Hash Aggregate operators, the memory is divided between them.
+When a query requires sorting, joining, and aggregation, Drill equally divides the memory available among each instance of these memory intensive operators in a query. The number of instances is equivalent to the number of these operators in the query plan, each multiplied by its degree of parallelism. The degree of parallelism is the number of minor fragments required to perform the work for each instance of an operator. When an instance of an operator must process more data than it can hold, the operator temporarily spills some of the data to a directory on disk to complete its work.  
 
-When a query is parallelized, the number of operators is multiplied, which reduces the amount of memory given to each instance of the Sort and Hash Aggregate operators during a query. If you encounter memory issues when running queries with Sort and Hash Aggregate operators, calculate the memory requirements for your queries and the amount of available memory on each node. Based on the information, increase the value of the `planner.memory.max_query_memory_per_node` option using the ALTER SYSTEM|SESSION SET command, as shown:  
 
-    ALTER SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = <new_value>  
-  
+##Spill to Disk  
 
-The `planner.memory.enable_memory_estimation` option toggles the state of memory estimation and re-planning of a query. When enabled, Drill conservatively estimates memory requirements and typically excludes memory-constrained operators from the query plan, which can negatively impact performance. The default setting is false. If you want Drill to use very conservative memory estimates, use the ALTER SYSTEM|SESSION SET command to change the setting, as shown:  
+Spilling to disk prevents queries that use memory intensive operations from failing with out-of-memory errors. The Spill to Disk feature enables the Sort, Hash Aggregate, and Hash Join operators to automatically write excess data (as files) to a temporary directory on disk when the memory requirements for the operators exceed the set memory limit. Queries run uninterrupted while the operators perform the spill operations in the background.
 
-    ALTER SYSTEM|SESSION SET `planner.memory.enable_memory_estimation` = true  
+When the Sort, Hash Aggregate, and Hash Join operators finish processing the data in memory, they read the spilled data back from disk and then finish processing the data. The operators clean up their data (files) from the temporary spill location after they finish processing the data. 
 
- 
-##Spill to Disk  
-Spilling data to disk prevents queries that use memory-intensive Sort and Hash Aggregate operations from failing with out-of-memory errors. Drill automatically writes excess data to a temporary directory on disk when queries with Sort or Hash Aggregate operations exceed the set memory limit on a Drill node. When the operators finish processing the in-memory data, Drill reads the spilled data back from disk, and the operators finish processing the data. When the operations complete, Drill removes the data from disk.  
+Ideally, you want to allocate enough memory for Drill to perform all operations in memory. When data spills to disk, you will not see any difference in terms of how queries run, however spilling to disk can impact performance due to the additional I/O required to write data to disk and read the data back. See Memory Allocation (page 4) for more information. 
 
-Spilling data to disk enables queries to run uninterrupted while Drill performs the spill operations in the background. However, there can be performance impact due to the time required to spill data and then read the data back from disk.  
+**Note:** Drill 1.13 and later supports spilling to disk for the Hash Join, Hash Aggregate, and Sort operators. Drill 1.11 and 1.12 supports spilling to disk for the Hash Aggregate and Sort operators. Releases of Drill prior to 1.11 only support spilling to disk for the Sort operator.  
 
-{% include startnote.html %}Drill 1.11 and later supports spilling to disk for the Hash Aggregate operator in addition to the Sort operator. Previous releases of Drill only supported spilling to disk for the Sort operator.{% include endnote.html %}  
+**Spill Locations** 
 
-###Spill Locations  
-Drill writes data to a temporary work area on disk. The default location of the temporary work area is /tmp/drill/spill on the local file system. The /tmp/drill/spill directory should suffice for small workloads or examples, however it is highly recommended that you redirect the default spill location to a location with enough disk space to support spilling for large workloads.  
- 
-{% include startnote.html %}Spilled data may require more space than the table referenced in the query that is spilling the data. For example, if a table is 100 GB per node, the spill directory should have the capacity to hold more than 100 GB.{% include endnote.html %}
- 
-When you configure the spill location, you can specify a single directory, or a list of directories into which the sort and hash aggregate operators both spill. Alternatively, you can set specific spill directories for each type of operator, however this is not recommended as these options will be deprecated in future releases of Drill. For more information, see the Spill to Disk Configuration Options section below.  
+The Sort, Hash Aggregate, and Hash Join operators write data to a temporary work area on disk when they cannot process all of the data in memory. The default location of the temporary work area is /tmp/drill/spill on the local file system. 
 
-###Spill to Disk Configuration Options  
-The options related to spilling reside in the drill-override.conf file on each Drill node. An administrator or someone familiar with storage and disks should manage these settings.
+The /tmp/drill/spill directory should suffice for small workloads or examples, however it is highly recommended that you redirect the default spill location to a location with enough disk space to support spilling for large workloads.
 
-{% include startnote.html %}You can see examples of these configuration options in the drill-override-example.conf file located in the <drill_installation>/conf directory.{% include endnote.html %} 
+**Note:** Spilled data may require more space than the table referenced in the query that is spilling the data. For example, if a table is 100 GB per node, the spill directory should have the capacity to hold more than 100 GB.
 
-The following list describes the configuration options for spilling data to disk:  
+When you configure the spill location, you can specify a single directory or a list of directories into which the Sort, Hash Aggregate, and Hash Join operators spill data. For more information, see the Spill to Disk Configuration Options section below.  
 
-* **drill.exe.spill.fs**  
-Introduced in Drill 1.11. The default file system on the local machine into which the Sort and Hash Aggregate operators spill data. This is the recommended option to use for spilling. You can configure this option so that data spills into a distributed file system, such as hdfs. For example, "hdfs:///". The default setting is "file:///".  
-  
-* **drill.exec.spill.directories**  
-Introduced in Drill 1.11. The list of directories into which the Sort and Hash Aggregate operators spill data. The list must be an array with directories separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"]. This is the recommended option for spilling to multiple directories. The default setting is ["/tmp/drill/spill"].  
-  
-* **drill.exec.sort.external.spill.fs**    
-Overrides the default location into which the Sort operator spills data. Instead of spilling into the location set by the `drill.exec.spill.fs` option, the Sort operators spill into the location specified by this option.  
-**Note:** As of Drill 1.11, this option is supported for backward compatibility, however in future releases, this option will be deprecated. It is highly recommended that you use the `drill.exec.spill.fs` option to set the spill location instead. The default setting is "file:///".  
+**Spill to Disk Configuration Options**  
 
-* **drill.exec.sort.external.spill.directories**   
-Overrides the location into which the Sort operator spills data. Instead of spilling into the location set by the `drill.exec.spill.directories` option, the Sort operators spill into the directories specified by this option. The list must be an array with directories separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"].  
-**Note:** As of Drill 1.11, this option is supported for backward compatibility, however in future releases, this option will be deprecated. It is highly recommended that you use the `drill.exec.spill.directories` option to set the spill location instead. The default setting is ["/tmp/drill/spill"].  
- 
-* **drill.exec.hashagg.spill.fs**  
-Overrides the location into which the Hash Aggregate operator spills data. Instead of spilling into the location set by the `drill.exec.spill.fs` option, the Hash Aggregate operator spills into the location specified by this option. Setting this option to 1 disables spilling for the Hash Aggregate operator.  
-**Note:** As of Drill 1.11, this option is supported for backward compatibility, however in future releases, this option will be deprecated. It is highly recommended that you use the `drill.exec.spill.fs` option to set the spill location instead. The default setting is "file:///".  
-  
-* **drill.exec.hashagg.spill.directories**    
-Overrides the location into which the Hash Aggregate operator spills data. Instead of spilling into the location set by the `drill.exec.spill.directories` option, the Hash Aggregate operator spills to the directories specified by this option. The list must be an array with directories separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"].  
-**Note:** As of Drill 1.11, this option is supported for backward compatibility, however in future releases, this option will be deprecated. It is highly recommended that you use the `drill.exec.spill.directories option` to set the spill location instead.  
+The drill-override.conf file, located in the /conf directory, contains options that set the spill locations for the Hash and Sort operators. An administrator can change the file system and directories into which the operators spill data. Refer to the drill-override-example.conf file for examples. 
+
+The following list describes the spill to disk configuration options:  
+
+- **drill.exe.spill.fs**  
+Introduced in Drill 1.11. The default file system on the local machine into which the Sort, Hash Aggregate, and Hash Join operators spill data. You can configure this option so that data spills into a distributed file system, such as hdfs. For example, "hdfs:///". The default setting is "file:///".
+- **drill.exec.spill.directories**  
+Introduced in Drill 1.11. The list of directories into which the Sort, Hash Aggregate, and Hash Join operators spill data. The list must be an array with directories separated by a comma, for example ["/fs1/drill/spill" , "/fs2/drill/spill" , "/fs3/drill/spill"]. The default setting is ["/tmp/drill/spill"].  
+
+**Note:** The following options were available prior to Drill 1.11, but have since been deprecated and replaced with the options described above:  
+
+- Drill.exec.sort.external.spill.fs (Replaced by drill.exec.spill.fs)
+- Drill.exec.sort.external.spill.directories (Replaced by drill.exec.spill.directories)
+- Drill.exec.hashagg.spill.fs (Replaced by drill.exec.spill.fs)  
+
+
+##Memory Allocation  
+
+Drill evenly splits the available memory among all instances of the Sort, Hash Aggregate, and Hash Join operators. When a query is parallelized, the number of operators is multiplied, which reduces the amount of memory given to each instance of the operators during a query.  
 
+**Memory Allocation Configuration Options**  
 
-##Hash-Based Operator Configuration Settings
-Use the ALTER SYSTEM|SESSION SET commands with the options below to disable the Hash Aggregate and Hash Join operators, modify the hash table size, or disable memory estimation. Typically, you set the options at the session level unless you want the setting to persist across all sessions.
+The `planner.memory.max_query_memory_per_node` and `planner.memory.percent_per_query` options set the amount of memory that Drill can allocate to a query on a node. Both options are enabled by default. Of these two options, Drill picks the setting that provides the most memory.  
 
-The following options control the hash-based operators:
+- **planner.memory.max_query_memory_per_node**  
+The `planner.memory.max_query_memory_per_node` option, set at 2 GB by default, is the minimum amount of memory available to Drill per query on a node. The default of 2 GB typically allows between two and three concurrent queries to run when the JVM is configured to use 8 GB of direct memory (default). When the memory requirement for Drill increases, the default of 2GB is constraining. You must increase the amount of memory for queries to complete, unless the setting for the planner.memory.percent_per_query option allows for Drill to use more memory.
+- **planner.memory.percent_per_query**  
+Alternatively, the `planner.memory.percent_per_query` option sets the memory as a percentage of the total direct memory. For example, if the allocation is set to 10%, and the total direct memory is 128 GB, each query gets approximately 13 GB.  
 
-* **planner.enable_hashagg**  
-Enables or disables hash aggregation; otherwise, Drill does a sort-based aggregation. This option is enabled by default. The default, and recommended, setting is true. 
-The Hash Aggregate operator uses an uncontrolled amount of memory, up to 10 GB, after which the operator runs out of memory. As of Drill 1.11, the Hash Aggregate operator can write to disk. 
+The percentage is calculated using the following formula:  
 
-* **planner.enable_hashjoin**  
-Enables or disables the memory hungry hash join. Drill assumes that a query will have adequate memory to complete and tries to use the fastest operations possible to complete the planned inner, left, right, or full outer joins using a hash table. The Hash Join operator uses an uncontrolled amount of memory, up to 10 GB, after which the operator runs out of memory. Currently, this operator does not write to disk. Disabling hash join allows Drill to manage arbitrarily large data in a small memory footprint. This option is enabled by default. The default setting is true.
+       (1 - non-managed allowance)/concurrency
 
-* **exec.min_hash_table_size**  
-Starting size for hash tables. Increase this setting based on the memory available to improve performance. The default setting for this option is 65536. The setting can range from 0 to 1073741824.
+The non-managed allowance is an assumed amount of system memory that non-managed operators will use. Non-managed operators do not spill to disk. The default non-managed allowance assumes 50% of the total system memory. And, the concurrency is the number of concurrent queries that may run. The default assumption is 10.
 
-* **exec.max\_hash\_table_size**  
-Ending size for hash tables. The default setting for this option is 1073741824. The setting can range from 0 to 1073741824.
+Based on the default assumptions, the default value of 5% is calculated as follows:  
 
+       (1 - .50)/10 = 0.05  
 
+This value is only used when throttling is disabled. Setting the value to 0 disables the option. You can increase or decrease the value, however you should set the percentage well below the JVM direct memory to account for the cases where Drill does not manage memory, such as for the less memory intensive operators.  
+
+**Increasing the Available Memory**  
+
+You can increase the amount of available memory to Drill using the ALTER SYSTEM|SESSION SET commands with the `planner.memory.max_query_memory_per_node` or `planner.memory.percent_per_query` options, as shown:  
+
+       ALTER SYSTEM|SESSION SET `planner.memory.max_query_memory_per_node` = <new_value>
+       //The default value is to 2147483648 bytes (2GB). 
+       
+       ALTER SYSTEM|SESSION SET `planner.memory.percent_per_query` = <new_value>
+       //The default value is 0.05.  
+
+##Disabling the Hash Operators  
+
+You can disable the Hash Aggregate and Hash Join operators. When you disable these operators, Drill creates alternative query plans that use the Sort operator and the Streaming Aggregate or the Merge Join operator. 
+
+Use the ALTER SYSTEM|SESSION SET commands with the following options to disable the Hash Aggregate and Hash Join operators. Typically, you set the options at the session level unless you want the setting to persist across all sessions. 
+
+The following options control the hash-based operators:  
+
+- **planner.enable_hashagg**  
+Enables or disables hash aggregation; otherwise, Drill does a sort-based aggregation. This option is enabled by default. The default, and recommended, setting is true. Prior to Drill 1.11, the Hash Aggregate operator used an uncontrolled amount of memory (up to 10 GB), after which the operator ran out of memory. As of Drill 1.11, the Hash Aggregate operator can write to disk.
+- **planner.enable_hashjoin**  
+Enables or disables hash joins. This option is enabled by default. Drill assumes that a query will have adequate memory to complete and tries to use the fastest operations possible Drill 1.11, the Hash Join operator used an uncontrolled amount of memory (up to 10 GB), after which the operator ran out of memory. As of Drill 1.13, this operator can write to disk. This option is enabled by default.
+
+
+
+
+
+
+ 
   
 
 
 
 
+

http://git-wip-us.apache.org/repos/asf/drill/blob/ccd89314/team.md
----------------------------------------------------------------------
diff --git a/team.md b/team.md
index 94583d6..8a039d1 100755
--- a/team.md
+++ b/team.md
@@ -43,4 +43,5 @@ We welcome contributions to the project. If you're interested in contributing, t
 | Anil Kumar Batchu | akumarb2010 |  
 | Vitalii Diravka  | vitalii |  
 | Kamesh Bhallamudi | kameshb |  
+| Kunal Khatua | kunal |
 


Mime
View raw message