carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ravipes...@apache.org
Subject [46/50] carbondata git commit: [CARBONDATA-2750] Added Documentation for Local Dictionary Support
Date Mon, 30 Jul 2018 18:43:12 GMT
[CARBONDATA-2750] Added Documentation for Local Dictionary Support

Added Documentation for Local Dictionary Support

This closes #2520


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/e21e494b
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/e21e494b
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/e21e494b

Branch: refs/heads/branch-1.4
Commit: e21e494b6fa14e40eb5fdd9291fb051603644211
Parents: d691d49
Author: praveenmeenakshi56 <praveenmeenakshi56@gmail.com>
Authored: Wed Jul 25 21:01:37 2018 +0530
Committer: ravipesala <ravi.pesala@gmail.com>
Committed: Tue Jul 31 00:11:26 2018 +0530

----------------------------------------------------------------------
 docs/data-management-on-carbondata.md | 66 ++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/e21e494b/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md b/docs/data-management-on-carbondata.md
index da259a6..27cdab6 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -124,6 +124,52 @@ This tutorial is going to introduce all commands and data operations
on CarbonDa
      TBLPROPERTIES ('streaming'='true')
      ```
 
+   - **Local Dictionary Configuration**
+     
+     Local Dictionary is generated only for no-dictionary string/varchar datatype columns.
It helps in:
+     1. Getting more compression on dimension columns with less cardinality.
+     2. Filter queries and full scan queries on No-dictionary columns with local dictionary
will be faster as filter will be done on encoded data.
+     3. Reducing the store size and memory footprint as only unique values will be stored
as part of local dictionary and corresponding data will be stored as encoded data.
+   
+     By default, Local Dictionary will be enabled and generated for all no-dictionary string/varchar
datatype columns.
+          
+     Users will be able to pass following properties in create table command: 
+          
+     | Properties | Default value | Description |
+     | ---------- | ------------- | ----------- |
+     | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary will be enabled for
the table | 
+     | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary
generation (range- 1000 to 100000) |
+     | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for
which Local Dictionary is generated. |
+     | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated
|
+        
+      **NOTE:**  If the cardinality exceeds the threshold, this column will not use local
dictionary encoding. And in this case, the data loading performance will decrease since there
is a rollback procedure for local dictionary encoding.
+      
+      **Calculating Memory Usage for Local Dictionary:**
+      
+      Encoded data and Actual data are both stored when Local Dictionary is enabled.
+      Suppose 'x' columns are configured for Local Dictionary generation out of a total of
'y' string/varchar columns. 
+      
+      Total size will be 
+      
+      Memory size(y-x) + ((4 bytes * number of rows) * x) + (Local Dictionary size of x columns)
+      
+      Local Dictionary size = ((memory occupied by each unique value * cardinality of the
column) * number of columns)
+      
+### Example:
+ 
+   ```
+   CREATE TABLE carbontable(
+             
+               column1 string,
+             
+               column2 string,
+             
+               column3 LONG )
+             
+     STORED BY 'carbondata'
+     TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','LOCAL_DICTIONARY_THRESHOLD'='1000',
+     'LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2')
+   ```
 ### Example:
 
    ```
@@ -390,6 +436,11 @@ This tutorial is going to introduce all commands and data operations
on CarbonDa
      ```
       NOTE: Add Complex datatype columns is not supported.
 
+Users can specify which columns to include and exclude for local dictionary generation after
adding new columns. These will be appended with the already existing local dictionary include
and exclude columns of main table respectively.
+  ```
+     ALTER TABLE carbon ADD COLUMNS (a1 STRING, b1 STRING) TBLPROPERTIES('LOCAL_DICTIONARY_INCLUDE'='a1','LOCAL_DICTIONARY_EXCLUDE'='b1')
+  ```
+
    - **DROP COLUMNS**
    
      This command is used to delete the existing column(s) in a table.
@@ -442,6 +493,21 @@ This tutorial is going to introduce all commands and data operations
on CarbonDa
       ```
       **NOTE:**
       * Merge index is not supported on streaming table.
+      
+- **SET and UNSET for Local Dictionary Properties**
+
+   When set command is used, all the newly set properties will override the corresponding
old properties if exists.
+  
+   Example to SET Local Dictionary Properties:
+    ```
+   ALTER TABLE tablename SET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='false','LOCAL_DICTIONARY_THRESHOLD'='1000','LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2')
+    ```
+   When Local Dictionary properties are unset, corresponding default values will be used
for these properties.
+      
+   Example to UNSET Local Dictionary Properties:
+    ```
+   ALTER TABLE tablename UNSET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE')
+    ```
 
 ### DROP TABLE
   


Mime
View raw message