carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ravipes...@apache.org
Subject [1/9] incubator-carbondata git commit: fix docs issues
Date Fri, 17 Feb 2017 14:01:25 GMT
Repository: incubator-carbondata
Updated Branches:
  refs/heads/branch-1.0 8a5e44e98 -> 3236c764c


fix docs issues

fix docs issues

fix comments


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/657eccd9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/657eccd9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/657eccd9

Branch: refs/heads/branch-1.0
Commit: 657eccd9ea4c0562907d7db1c232930c0167c4e1
Parents: 8a5e44e
Author: chenliang613 <chenliang613@huawei.com>
Authored: Sun Jan 22 16:10:22 2017 +0800
Committer: ravipesala <ravi.pesala@gmail.com>
Committed: Fri Feb 17 19:23:45 2017 +0530

----------------------------------------------------------------------
 docs/configuration-parameters.md | 12 ++++++------
 docs/data-management.md          |  2 +-
 docs/quick-start-guide.md        | 20 +++++++++-----------
 3 files changed, 16 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/657eccd9/docs/configuration-parameters.md
----------------------------------------------------------------------
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index bc6919a..774734a 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -34,10 +34,10 @@ This section provides the details of all the configurations required for
the Car
 | Property | Default Value | Description |
 |----------------------------|-------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | carbon.storelocation | /user/hive/warehouse/carbon.store | Location where CarbonData will
create the store, and write the data in its own format. NOTE: Store location should be in
HDFS. |
-| carbon.ddl.base.hdfs.url | hdfs://hacluster/opt/data | This property is used to configure
the HDFS relative path from the HDFS base path, configured in fs.defaultFS. The path configured
in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS.
If this path is configured, then user need not pass the complete path while dataload. For
example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv,
the path "hdfs://10.18.101.155:54310" will come from property fs.defaultFS and user can configure
the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user can specify the csv path
as /2016/xyz.csv. |
+| carbon.ddl.base.hdfs.url | hdfs://hacluster/opt/data | This property is used to configure
the HDFS relative path, the path configured in carbon.ddl.base.hdfs.url will be appended to
the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass
the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv,
the path "hdfs://10.18.101.155:54310" will come from property fs.defaultFS and user can configure
the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user can specify the csv path
as /2016/xyz.csv. |
 | carbon.badRecords.location | /opt/Carbon/Spark/badrecords | Path where the bad records
are stored. |
-| carbon.kettle.home | $SPARK_HOME/carbonlib/carbonplugins | Path used by CarbonData internally
to create graph for loading the data. |
-| carbon.data.file.version | 2 | If this parameter value is set to 1, then CarbonData will
support the data load which is in old format. If the value is set to 2, then CarbonData will
support the data load of new format only. NOTE: The file format created before DataSight Spark
V100R002C30 is considered as old format. |                    
+| carbon.kettle.home | $SPARK_HOME/carbonlib/carbonplugins | Configuration for loading the
data with kettle. |
+| carbon.data.file.version | 2 | If this parameter value is set to 1, then CarbonData will
support the data load which is in old format(0.x version). If the value is set to 2(1.x onwards
version), then CarbonData will support the data load of new format only.|                
   
 
 ##  Performance Configuration
 This section provides the details of all the configurations required for CarbonData Performance
Optimization.
@@ -132,7 +132,7 @@ This section provides the details of all the configurations required for
CarbonD
 | Parameter | Default Value | Description |
 |---------------------------------------|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | high.cardinality.identify.enable | true | If the parameter is true, the high cardinality
columns of the dictionary code are automatically recognized and these columns will not be
used as global dictionary encoding. If the parameter is false, all dictionary encoding columns
are used as dictionary encoding. The high cardinality column must meet the following requirements:
value of cardinality > configured value of high.cardinalityEqually, the value of cardinality
is higher than the threshold.value of cardinality/ row number x 100 > configured value
of high.cardinality.row.count.percentageEqually, the ratio of the cardinality value to data
row number is higher than the configured percentage. |
-| high.cardinality.threshold | 1000000 | Threshold to identify whether high cardinality column.Configuration
value formula: Value of cardinality > configured value of high.cardinality. The minimum
value is 10000. |
+| high.cardinality.threshold | 1000000 | high.cardinality.threshold | 1000000 | It is a threshold
to identify high cardinality of the columns.If the value of columns' cardinality > the
configured value, then the columns are excluded from dictionary encoding. |
 | high.cardinality.row.count.percentage | 80 | Percentage to identify whether column cardinality
is more than configured percent of total row count.Configuration value formula:Value of cardinality/
row number x 100 > configured value of high.cardinality.row.count.percentageThe value of
the parameter must be larger than 0. |
 | carbon.cutOffTimestamp | 1970-01-01 05:30:00 | Sets the start date for calculating the
timestamp. Java counts the number of milliseconds from start of "1970-01-01 00:00:00". This
property is used to customize the start of position. For example "2000-01-01 00:00:00". The
date must be in the form "carbon.timestamp.format". NOTE: The CarbonData supports data store
up to 68 years from the cut-off time defined. For example, if the cut-off time is 1970-01-01
05:30:00, then the data can be stored up to 2038-01-01 05:30:00. |
 | carbon.timegranularity | SECOND | The property used to set the data granularity level DAY,
HOUR, MINUTE, or SECOND. |
@@ -142,8 +142,8 @@ This section provides the details of all the configurations required for
CarbonD
  
 | Parameter | Default Value | Description |
 |----------------------------------------|--------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| spark.driver.memory | 1g | Amount of memory to use for the driver process, i.e. where SparkContext
is initialized. NOTE: In client mode, this config must not be set through the SparkConf directly
in your application, because the driver JVM has already started at that point. Instead, please
set this through the --driver-memory command line option or in your default properties file.
|
-| spark.executor.memory | 1g | Amount of memory to use per executor process. |
+| spark.driver.memory | 1g | Amount of memory to be used by the driver process. |
+| spark.executor.memory | 1g | Amount of memory to be used per executor process. |
 | spark.sql.bigdata.register.analyseRule | org.apache.spark.sql.hive.acl.CarbonAccessControlRules
| CarbonAccessControlRules need to be set for enabling Access Control. |
    
  
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/657eccd9/docs/data-management.md
----------------------------------------------------------------------
diff --git a/docs/data-management.md b/docs/data-management.md
index 70f4d28..2663aff 100644
--- a/docs/data-management.md
+++ b/docs/data-management.md
@@ -73,7 +73,7 @@ This tutorial is going to introduce you to the conceptual details of data
manage
    
    * Delete by Segment ID
       
-      After you get the segment ID of the segment that you want to delete, execute the [DELETE](dml-operation-on-carbondata.md
) command for the selected segment.
+      After you get the segment ID of the segment that you want to delete, execute the delete
command for the selected segment.
       The status of deleted segment is updated to Marked for delete / Marked for Update.
       
 | SegmentSequenceId | Status            | Load Start Time      | Load End Time        |

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/657eccd9/docs/quick-start-guide.md
----------------------------------------------------------------------
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index ceeaac0..5a2d6e2 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -70,24 +70,22 @@ val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(
 ##### Creating a Table
 
 ```
-scala>carbon.sql("create table if not exists test_table
-                (id string, name string, city string, age Int)
-                STORED BY 'carbondata'")
+scala>carbon.sql("CREATE TABLE IF NOT EXISTS test_table(id string, name string, city string,
age Int) STORED BY 'carbondata'")
 ```
 
 ##### Loading Data to a Table
 
 ```
-scala>carbon.sql("load data inpath 'sample.csv file's path' into table test_table")
+scala>carbon.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table")
 ```
 NOTE:Please provide the real file path of sample.csv for the above script.
 
 ###### Query Data from a Table
 
 ```
-scala>spark.sql("select * from test_table").show()
+scala>carbon.sql("SELECT * FROM test_table").show()
 
-scala>spark.sql("select city, avg(age), sum(age) from test_table group by city").show()
+scala>carbon.sql("SELECT city, avg(age), sum(age) FROM test_table GROUP BY city").show()
 ```
 
 ## Interactive Analysis with Spark Shell
@@ -122,24 +120,24 @@ NOTE: By default store location is pointed to "../carbon.store", user
can provid
 ##### Creating a Table
 
 ```
-scala>cc.sql("create table if not exists test_table (id string, name string, city string,
age Int) STORED BY 'carbondata'")
+scala>cc.sql("CREATE TABLE IF NOT EXISTS test_table (id string, name string, city string,
age Int) STORED BY 'carbondata'")
 ```
 To see the table created :
 
 ```
-scala>cc.sql("show tables").show()
+scala>cc.sql("SHOW TABLES").show()
 ```
 
 ##### Loading Data to a Table
 
 ```
-scala>cc.sql("load data inpath 'sample.csv file's path' into table test_table")
+scala>cc.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table")
 ```
 NOTE:Please provide the real file path of sample.csv for the above script.
 
 ##### Query Data from a Table
 
 ```
-scala>cc.sql("select * from test_table").show()
-scala>cc.sql("select city, avg(age), sum(age) from test_table group by city").show()
+scala>cc.sql("SELECT * FROM test_table").show()
+scala>cc.sql("SELECT city, avg(age), sum(age) FROM test_table GROUP BY city").show()
 ```


Mime
View raw message