carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jack...@apache.org
Subject carbondata git commit: [CARBONDATA-2793][32k][Doc] Add 32k support in document
Date Wed, 01 Aug 2018 14:07:49 GMT
Repository: carbondata
Updated Branches:
  refs/heads/master de9246066 -> f9b02a5c1


[CARBONDATA-2793][32k][Doc] Add 32k support in document

This closes #2572


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f9b02a5c
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f9b02a5c
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f9b02a5c

Branch: refs/heads/master
Commit: f9b02a5c1204359b8c6ae20e6ee68174f1c6d4ef
Parents: de92460
Author: xuchuanyin <xuchuanyin@hust.edu.cn>
Authored: Fri Jul 27 16:10:44 2018 +0800
Committer: Jacky Li <jacky.likun@qq.com>
Committed: Wed Aug 1 22:07:28 2018 +0800

----------------------------------------------------------------------
 docs/data-management-on-carbondata.md      | 48 +++++++++++++++++++------
 docs/supported-data-types-in-carbondata.md |  3 ++
 2 files changed, 40 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/f9b02a5c/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md b/docs/data-management-on-carbondata.md
index 6aaaaa3..836fff9 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -137,7 +137,7 @@ This tutorial is going to introduce all commands and data operations on
CarbonDa
           
      | Properties | Default value | Description |
      | ---------- | ------------- | ----------- |
-     | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled
for the table | 
+     | LOCAL_DICTIONARY_ENABLE | false | By default, local dictionary will not be enabled
for the table |
      | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for local dictionary
generation (range- 1000 to 100000) |
      | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar columns | Columns for
which Local Dictionary is generated. |
      | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary is not generated
|
@@ -240,11 +240,11 @@ This tutorial is going to introduce all commands and data operations
on CarbonDa
 	 ```
 	 
    - **Caching at Block or Blocklet Level**
-   
+
      This feature allows you to maintain the cache at Block level, resulting in optimized
usage of the memory. The memory consumption is high if the Blocklet level caching is maintained
as a Block can have multiple Blocklet.
 	 
 	 Following are the valid values for CACHE_LEVEL:
-	 
+
 	 *Configuration for caching in driver at Block level (default value).*
 	 
 	 ```
@@ -285,21 +285,47 @@ This tutorial is going to introduce all commands and data operations
on CarbonDa
 	 ```
 	 ALTER TABLE employee SET TBLPROPERTIES (‘CACHE_LEVEL’=’Blocklet’)
 	 ```
-	 
-	 - **Support Flat folder same as Hive/Parquet**
-	 
+
+    - **Support Flat folder same as Hive/Parquet**
+
 	  This feature allows all carbondata and index files to keep directy under tablepath. Currently
all carbondata/carbonindex files written under tablepath/Fact/Part0/Segment_NUM folder and
it is not same as hive/parquet folder structure. This feature makes all files written will
be directly under tablepath, it does not maintain any segment folder structure.This is useful
for interoperability between the execution engines and plugin with other execution engines
like hive or presto becomes easier.
-	  
+
 	  Following table property enables this feature and default value is false.
 	  ```
 	   'flat_folder'='true'
-	  ``` 
+	  ```
 	  Example:
 	  ```
 	  CREATE TABLE employee (name String, city String, id int) STORED BY ‘carbondata’ TBLPROPERTIES
('flat_folder'='true')
 	  ```
-	  
-	 
+
+    - **String longer than 32000 characters**
+
+     In common scenarios, the length of string is less than 32000,
+     so carbondata stores the length of content using Short to reduce memory and space consumption.
+     To support string longer than 32000 characters, carbondata introduces a table property
called `LONG_STRING_COLUMNS`.
+     For these columns, carbondata internally stores the length of content using Integer.
+
+     You can specify the columns as 'long string column' using below tblProperties:
+
+     ```
+     // specify col1, col2 as long string columns
+     TBLPROPERTIES ('LONG_STRING_COLUMNS'='col1,col2')
+     ```
+
+     Besides, you can also use this property through DataFrame by
+     ```
+     df.format("carbondata")
+       .option("tableName", "carbonTable")
+       .option("long_string_columns", "col1, col2")
+       .save()
+     ```
+
+     If you are using Carbon-SDK, you can specify the datatype of long string column as `varchar`.
+     You can refer to SDKwriterTestCase for example.
+
+     **NOTE:** The LONG_STRING_COLUMNS can only be string/char/varchar columns and cannot
be dictionary_include/sort_columns/complex columns.
+
 ## CREATE TABLE AS SELECT
   This function allows user to create a Carbon table from any of the Parquet/Hive/Carbon
table. This is beneficial when the user wants to create Carbon table from any other Parquet/Hive
table and use the Carbon query engine to query and achieve better query results for cases
where Carbon is faster than other file formats. Also this feature can be used for backing
up the data.
 
@@ -745,7 +771,7 @@ Users can specify which columns to include and exclude for local dictionary
gene
   * If the FORCE option is used, then it auto-converts the data by storing the bad records
as NULL before Loading data.
   * If the IGNORE option is used, then bad records are neither loaded nor written to the
separate CSV file.
   * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and
the load operation fails.
-  * The maximum number of characters per column is 32000. If there are more than 32000 characters
in a column, data loading will fail.
+  * The default maximum number of characters per column is 32000. If there are more than
32000 characters in a column, please refer to *String longer than 32000 characters* section.
 
   Example:
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/f9b02a5c/docs/supported-data-types-in-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/supported-data-types-in-carbondata.md b/docs/supported-data-types-in-carbondata.md
index 7260afe..eb74a2e 100644
--- a/docs/supported-data-types-in-carbondata.md
+++ b/docs/supported-data-types-in-carbondata.md
@@ -35,6 +35,9 @@
     * CHAR
     * VARCHAR
 
+    **NOTE**: For string longer than 32000 characters, use `LONG_STRING_COLUMNS` in table
property.
+    Please refer to TBLProperties in [CreateTable](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md#create-table)
for more information.
+
   * Complex Types
     * arrays: ARRAY``<data_type>``
     * structs: STRUCT``<col_name : data_type COMMENT col_comment, ...>``


Mime
View raw message