carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ravipes...@apache.org
Subject carbondata git commit: [CARBONDATA-2098] Add datamap managment description
Date Sat, 03 Mar 2018 10:24:38 GMT
Repository: carbondata
Updated Branches:
  refs/heads/master c125f0caa -> d0c2ab2dc


[CARBONDATA-2098] Add datamap managment description

Enhance document for datamap

This closes #2026


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/d0c2ab2d
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/d0c2ab2d
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/d0c2ab2d

Branch: refs/heads/master
Commit: d0c2ab2dc5abf16084354848dbcf6f5c45b3cae5
Parents: c125f0c
Author: Jacky Li <jacky.likun@qq.com>
Authored: Sat Mar 3 13:40:59 2018 +0800
Committer: ravipesala <ravi.pesala@gmail.com>
Committed: Sat Mar 3 15:43:14 2018 +0530

----------------------------------------------------------------------
 docs/datamap/preaggregate-datamap-guide.md      | 51 +++++++++++++++++---
 docs/datamap/timeseries-datamap-guide.md        | 23 ++++++---
 .../examples/PreAggregateTableExample.scala     |  2 +
 3 files changed, 64 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/d0c2ab2d/docs/datamap/preaggregate-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/preaggregate-datamap-guide.md b/docs/datamap/preaggregate-datamap-guide.md
index fabfd7d..199f674 100644
--- a/docs/datamap/preaggregate-datamap-guide.md
+++ b/docs/datamap/preaggregate-datamap-guide.md
@@ -1,5 +1,13 @@
 # CarbonData Pre-aggregate DataMap
   
+* [Quick Example](#quick-example)
+* [DataMap Management](#datamap-management)
+* [Pre-aggregate Table](#preaggregate-datamap-introduction)
+* [Loading Data](#loading-data)
+* [Querying Data](#querying-data)
+* [Compaction](#compacting-pre-aggregate-tables)
+* [Data Management](#data-management-with-pre-aggregate-tables)
+
 ## Quick example
 Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME
 
@@ -85,7 +93,35 @@ Start spark-shell in new terminal, type :paste, then copy and run the following
   spark.stop
 ```
 
-##PRE-AGGREGATE DataMap  
+#### DataMap Management
+DataMap can be created using following DDL
+  ```
+  CREATE DATAMAP [IF NOT EXISTS] datamap_name
+  ON TABLE main_table
+  USING "datamap_provider"
+  DMPROPERTIES ('key'='value', ...)
+  AS
+    SELECT statement
+  ```
+The string followed by USING is called DataMap Provider, in this version CarbonData supports
two 
+kinds of DataMap: 
+1. preaggregate, for pre-aggregate table. No DMPROPERTY is required for this DataMap
+2. timeseries, for timeseries roll-up table. Please refer to [Timeseries DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/timeseries-datamap-guide.md)
+
+DataMap can be dropped using following DDL
+  ```
+  DROP DATAMAP [IF EXISTS] datamap_name
+  ON TABLE main_table
+  ```
+To show all DataMaps created, use:
+  ```
+  SHOW DATAMAP 
+  ON TABLE main_table
+  ```
+It will show all DataMaps created on main table.
+
+
+## Preaggregate DataMap Introduction
   Pre-aggregate tables are created as DataMaps and managed as tables internally by CarbonData.

   User can create as many pre-aggregate datamaps required to improve query performance, 
   provided the storage requirements and loading speeds are acceptable.
@@ -163,7 +199,7 @@ SELECT country, max(price) from sales GROUP BY country
 will query against main table **sales** only, because it does not satisfy pre-aggregate table

 selection logic. 
 
-#### Loading data to pre-aggregate tables
+## Loading data
 For existing table with loaded data, data load to pre-aggregate table will be triggered by
the 
 CREATE DATAMAP statement when user creates the pre-aggregate table. For incremental loads
after 
 aggregates tables are created, loading data to main table triggers the load to pre-aggregate
tables 
@@ -174,7 +210,7 @@ meaning that data on main table and pre-aggregate tables are only visible
to the
 tables are loaded successfully, if one of these loads fails, new data are not visible in
all tables 
 as if the load operation is not happened.   
 
-#### Querying data from pre-aggregate tables
+## Querying data
 As a technique for query acceleration, Pre-aggregate tables cannot be queries directly. 
 Queries are to be made on main table. While doing query planning, internally CarbonData will
check 
 associated pre-aggregate tables with the main table, and do query plan transformation accordingly.

@@ -183,7 +219,8 @@ User can verify whether a query can leverage pre-aggregate table or not
by execu
 command, which will show the transformed logical plan, and thus user can check whether pre-aggregate
 table is selected.
 
-#### Compacting pre-aggregate tables
+
+## Compacting pre-aggregate tables
 Running Compaction command (`ALTER TABLE COMPACT`) on main table will **not automatically**

 compact the pre-aggregate tables created on the main table. User need to run Compaction command

 separately on each pre-aggregate table to compact them.
@@ -193,8 +230,10 @@ main table but not performed on pre-aggregate table, all queries still
can benef
 pre-aggregate tables. To further improve the query performance, compaction on pre-aggregate
tables 
 can be triggered to merge the segments and files in the pre-aggregate tables. 
 
-#### Data Management on pre-aggregate tables
-Once there is pre-aggregate table created on the main table, following command on the main
table
+## Data Management with pre-aggregate tables
+In current implementation, data consistence need to be maintained for both main table and
pre-aggregate
+tables. Once there is pre-aggregate table created on the main table, following command on
the main 
+table
 is not supported:
 1. Data management command: `UPDATE/DELETE/DELETE SEGMENT`. 
 2. Schema management command: `ALTER TABLE DROP COLUMN`, `ALTER TABLE CHANGE DATATYPE`, 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/d0c2ab2d/docs/datamap/timeseries-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/timeseries-datamap-guide.md b/docs/datamap/timeseries-datamap-guide.md
index ecd7234..886c161 100644
--- a/docs/datamap/timeseries-datamap-guide.md
+++ b/docs/datamap/timeseries-datamap-guide.md
@@ -1,14 +1,25 @@
 # CarbonData Timeseries DataMap
 
-## Supporting timeseries data (Alpha feature in 1.3.0)
+* [Timeseries DataMap](#timeseries-datamap-intoduction-(alpha-feature-in-1.3.0))
+* [Compaction](#compacting-pre-aggregate-tables)
+* [Data Management](#data-management-with-pre-aggregate-tables)
+
+## Timeseries DataMap Intoduction (Alpha feature in 1.3.0)
 Timeseries DataMap a pre-aggregate table implementation based on 'preaggregate' DataMap.

 Difference is that Timerseries DataMap has built-in understanding of time hierarchy and 
 levels: year, month, day, hour, minute, so that it supports automatic roll-up in time dimension

 for query.
+
+The data loading, querying, compaction command and its behavior is the same as preaggregate
DataMap.
+Please refer to [Pre-aggregate DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md)
+for more information.
   
-For instance, user can create multiple timeseries datamap on the main table which has a *event_time*
-column, one datamap for one time granularity. Then Carbondata can do automatic roll-up for
queries 
-on the main table.
+To use this datamap, user can create multiple timeseries datamap on the main table which
has 
+a *event_time* column, one datamap for one time granularity. Then Carbondata can do automatic

+roll-up for queries on the main table.
+
+For example, below statement effectively create multiple pre-aggregate tables  on main table
called 
+**timeseries**
 
 ```
 CREATE DATAMAP agg_year
@@ -126,10 +137,10 @@ the future CarbonData release.
 * timeseries datamaps created for each level needs to be dropped separately 
       
 
-#### Compacting timeseries datamp
+## Compacting timeseries datamp
 Refer to Compaction section in [preaggregation datamap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md).

 Same applies to timeseries datamap.
 
-#### Data Management on timeseries datamap
+## Data Management on timeseries datamap
 Refer to Data Management section in [preaggregation datamap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md).
 Same applies to timeseries datamap.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/carbondata/blob/d0c2ab2d/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
----------------------------------------------------------------------
diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
index ace3dcc..64ed525 100644
--- a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
+++ b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
@@ -99,6 +99,8 @@ object PreAggregateTableExample {
       s"""create datamap preagg_count on table maintable using 'preaggregate' as
          | select name, count(*) from maintable group by name""".stripMargin)
 
+    spark.sql("show datamap on table maintable").show
+
     spark.sql(
       s"""
          | SELECT id,max(age)


Mime
View raw message