carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chenliang...@apache.org
Subject carbondata git commit: [CARBONDATA-1770] Updated documentaion for data-management-on-carbondata.md and useful-tips-on-carbondata.md
Date Fri, 24 Nov 2017 02:46:59 GMT
Repository: carbondata
Updated Branches:
  refs/heads/master ea30f650e -> b0b7fc1a5


[CARBONDATA-1770] Updated documentaion for data-management-on-carbondata.md and useful-tips-on-carbondata.md

This closes #1556


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/b0b7fc1a
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/b0b7fc1a
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/b0b7fc1a

Branch: refs/heads/master
Commit: b0b7fc1a5f5e19db7f17434d2f6fa90831227e2f
Parents: ea30f65
Author: vandana <vandana.yadav759@gmail.com>
Authored: Thu Nov 23 12:21:20 2017 +0530
Committer: chenliang613 <chenliang613@huawei.com>
Committed: Fri Nov 24 10:46:44 2017 +0800

----------------------------------------------------------------------
 docs/data-management-on-carbondata.md | 31 +++++++++++++++++++-----------
 docs/useful-tips-on-carbondata.md     |  6 +++---
 2 files changed, 23 insertions(+), 14 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/b0b7fc1a/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md b/docs/data-management-on-carbondata.md
index 6880ba1..1a3d0a8 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -294,11 +294,11 @@ This tutorial is going to introduce all commands and data operations
on CarbonDa
     ```
     NOTE: ALL_DICTIONARY_PATH and COLUMNDICT can't be used together.
     
-  - **DATEFORMAT:** Date format for specified column.
+  - **DATEFORMAT/TIMESTAMPFORMAT:** Date and Timestamp format for specified column.
 
     ```
-    OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2')
-    ```
+    OPTIONS('DATEFORMAT' = 'yyyy-MM-dd','TIMESTAMPFORMAT'='yyyy-MM-dd HH:mm:ss')
+     ```
     NOTE: Date formats are specified by date pattern strings. The date pattern letters in
CarbonData are same as in JAVA. Refer to [SimpleDateFormat](http://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html).
 
   - **SINGLE_PASS:** Single Pass Loading enables single job to finish data loading with dictionary
generation on the fly. It enhances performance in the scenarios where the subsequent data
loading after initial load involves fewer incremental updates on the dictionary.
@@ -312,7 +312,8 @@ This tutorial is going to introduce all commands and data operations on
CarbonDa
    * If this option is set to TRUE then data loading will take less time.
    * If this option is set to some invalid value other than TRUE or FALSE then it uses the
default value.
    * If this option is set to TRUE, then high.cardinality.identify.enable property will be
disabled during data load.
-   
+   * For first Load SINGLE_PASS loading option is disabled.
+
    Example:
    ```
    LOAD DATA local inpath '/opt/rawdata/data.csv' INTO table carbontable
@@ -336,7 +337,11 @@ This tutorial is going to introduce all commands and data operations
on CarbonDa
     ```
 
   NOTE:
+  * BAD_RECORD_ACTION property can have four type of actions for bad records FORCE, REDIRECT,
IGNORE and FAIL.
   * If the REDIRECT option is used, CarbonData will add all bad records in to a separate
CSV file. However, this file must not be used for subsequent data loading because the content
may not exactly match the source record. You are advised to cleanse the original source record
for further data ingestion. This option is used to remind you which records are bad records.
+  * If the FORCE option is used, then it auto-corrects the data by storing the bad records
as NULL before Loading data.
+  * If the IGNORE option is used, then bad records are neither loaded nor written to the
separate CSV file.
+  * IF the FAIL option is used, then data loading fails if any bad records are found.
   * In loaded data, if all records are bad records, the BAD_RECORDS_ACTION is invalid and
the load operation fails.
   * The maximum number of characters per column is 100000. If there are more than 100000
characters in a column, data loading will fail.
 
@@ -379,7 +384,7 @@ This tutorial is going to introduce all commands and data operations on
CarbonDa
 
   Examples
   ```
-  INSERT INTO table1 SELECT item1 ,sum(item2 + 1000) as result FROM table2 group by item1
+  INSERT INTO table1 SELECT item1, sum(item2 + 1000) as result FROM table2 group by item1
   ```
 
   ```
@@ -507,8 +512,9 @@ This tutorial is going to introduce all commands and data operations on
CarbonDa
   STORED BY 'carbondata'
   [TBLPROPERTIES ('PARTITION_TYPE'='HASH',
                   'NUM_PARTITIONS'='N' ...)]
-  //N is the number of hash partitions
   ```
+  NOTE: N is the number of hash partitions
+
 
   Example:
   ```
@@ -531,7 +537,7 @@ This tutorial is going to introduce all commands and data operations on
CarbonDa
   PARTITIONED BY (partition_col_name data_type)
   STORED BY 'carbondata'
   [TBLPROPERTIES ('PARTITION_TYPE'='RANGE',
-                  'RANGE_INFO'='2014-01-01, 2015-01-01, 2016-01-01' ...)]
+                  'RANGE_INFO'='2014-01-01, 2015-01-01, 2016-01-01, ...')]
   ```
 
   NOTE:
@@ -561,7 +567,7 @@ This tutorial is going to introduce all commands and data operations on
CarbonDa
   PARTITIONED BY (partition_col_name data_type)
   STORED BY 'carbondata'
   [TBLPROPERTIES ('PARTITION_TYPE'='LIST',
-                  'LIST_INFO'='A, B, C' ...)]
+                  'LIST_INFO'='A, B, C, ...')]
   ```
   NOTE : List partition supports list info in one level group.
 
@@ -602,11 +608,14 @@ This tutorial is going to introduce all commands and data operations
on CarbonDa
 
 ### Drop a partition
 
+    Only drop partition definition, but keep data
+
   ```
-  //Only drop partition definition, but keep data
-  ALTER TABLE [db_name].table_name DROP PARTITION(partition_id)
+    ALTER TABLE [db_name].table_name DROP PARTITION(partition_id)
+   ```
 
-  //Drop both partition definition and data
+  Drop both partition definition and data
+  ```
   ALTER TABLE [db_name].table_name DROP PARTITION(partition_id) WITH DATA
   ```
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/b0b7fc1a/docs/useful-tips-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/useful-tips-on-carbondata.md b/docs/useful-tips-on-carbondata.md
index 30485da..0bf2940 100644
--- a/docs/useful-tips-on-carbondata.md
+++ b/docs/useful-tips-on-carbondata.md
@@ -25,7 +25,7 @@
 
 ## Suggestions to Create CarbonData Table
 
-  For example,the results of the analysis for table creation with dimensions ranging from
10 thousand to 10 billion rows and 100 to 300 columns have been summarized below.
+  For example, the results of the analysis for table creation with dimensions ranging from
10 thousand to 10 billion rows and 100 to 300 columns have been summarized below.
   The following table describes some of the columns from the table used.
 
   - **Table Column Description**
@@ -68,7 +68,7 @@
   the columns in the order of cardinality low to high. This ordering of frequently used columns
improves the compression ratio and
   enhances the performance of queries with filter on these columns.
 
-  For example if MSISDN, HOST and Dime_1 are frequently-used columns, then the column order
of table is suggested as
+  For example, if MSISDN, HOST and Dime_1 are frequently-used columns, then the column order
of table is suggested as
   Dime_1>HOST>MSISDN, because Dime_1 has the lowest cardinality.
   The create table command can be modified as suggested below :
 
@@ -142,7 +142,7 @@
   |carbon.merge.sort.reader.thread|Default: 3 |Specifies the number of cores used for temp
file merging during data loading in CarbonData.|
   |carbon.merge.sort.prefetch|Default: true | You may want set this value to false if you
have not enough memory|
 
-  For example, if there are 10 million records ,and i have only 16 cores ,64GB memory, will
be loaded to CarbonData table.
+  For example, if there are 10 million records, and i have only 16 cores, 64GB memory, will
be loaded to CarbonData table.
   Using the default configuration  always fail in sort step. Modify carbon.properties as
suggested below:
 
   ```


Mime
View raw message