carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jack...@apache.org
Subject [carbondata] branch master updated: [DOC] CarbonExtensions doc
Date Wed, 22 Jan 2020 06:36:57 GMT
This is an automated email from the ASF dual-hosted git repository.

jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
     new c15d55c  [DOC] CarbonExtensions doc
c15d55c is described below

commit c15d55c0aa0630de1a9a4d399a21d61e1a05647f
Author: QiangCai <qiangcai@qq.com>
AuthorDate: Mon Jan 20 16:25:05 2020 +0800

    [DOC] CarbonExtensions doc
    
    Why is this PR needed?
    explain how to use CarbonExtensions in spark
    
    What changes were proposed in this PR?
    Document is updated to introduce CarbonExtensions
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No
    
    This closes #3585
---
 docs/quick-start-guide.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index dedba36..f9f467c 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -39,6 +39,7 @@ This tutorial provides a quick introduction to using CarbonData. To follow
along
 CarbonData can be integrated with Spark,Presto and Hive execution engines. The below documentation
guides on Installing and Configuring with these execution engines.
 
 #### Spark
+[Installing and Configuring CarbonData to run locally with Spark SQL CLI (version: 2.3+)](#installing-and-configuring-carbondata-to-run-locally-with-spark-sql)
 
 [Installing and Configuring CarbonData to run locally with Spark Shell](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell)
 
@@ -65,12 +66,64 @@ CarbonData can be integrated with Spark,Presto and Hive execution engines.
The b
 #### Alluxio
 [CarbonData supports read and write with Alluxio](./alluxio-guide.md)
 
+## Installing and Configuring CarbonData to run locally with Spark SQL CLI (version: 2.3+)
+
+In Spark SQL CLI, it use CarbonExtensions to customize the SparkSession with CarbonData's
parser, analyzer, optimizer and physical planning strategy rules in Spark.
+To enable CarbonExtensions, we need to add the following configuration.
+
+|Key|Value|
+|---|---|
+|spark.sql.extensions|org.apache.spark.sql.CarbonExtensions| 
+
+Start Spark SQL CLI by running the following command in the Spark directory:
+
+```
+./bin/spark-sql --conf spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars
<carbondata assembly jar path>
+```
+###### Creating a Table
+
+```
+CREATE TABLE IF NOT EXISTS test_table (
+  id string,
+  name string,
+  city string,
+  age Int)
+STORED AS carbondata;
+```
+**NOTE**: CarbonExtensions only support "STORED AS carbondata" and "USING carbondata"
+
+###### Loading Data to a Table
+
+```
+LOAD DATA INPATH '/path/to/sample.csv' INTO TABLE test_table;
+```
+
+```
+insert into table test_table select '1', 'name1', 'city1', 1;
+```
+
+**NOTE**: Please provide the real file path of `sample.csv` for the above script. 
+If you get "tablestatus.lock" issue, please refer to [FAQ](faq.md)
+
+###### Query Data from a Table
+
+```
+SELECT * FROM test_table;
+```
+
+```
+SELECT city, avg(age), sum(age)
+FROM test_table
+GROUP BY city;
+```
+
 ## Installing and Configuring CarbonData to run locally with Spark Shell
 
 Apache Spark Shell provides a simple way to learn the API, as well as a powerful tool to
analyze data interactively. Please visit [Apache Spark Documentation](http://spark.apache.org/docs/latest/)
for more details on Spark shell.
 
 #### Basics
 
+###### Option 1: Using CarbonSession
 Start Spark shell by running the following command in the Spark directory:
 
 ```
@@ -99,6 +152,27 @@ val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(
    `SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<carbon_store_path>",
"<local metastore path>")`.
  - Data storage location can be specified by `<carbon_store_path>`, like `/carbon/data/store`,
`hdfs://localhost:9000/carbon/data/store` or `s3a://carbon/data/store`.
 
+###### Option 2: Using SparkSession with CarbonExtensions
+
+Start Spark shell by running the following command in the Spark directory:
+
+```
+./bin/spark-shell --conf spark.sql.extensions=org.apache.spark.sql.CarbonExtensions --jars
<carbondata assembly jar path>
+```
+**NOTE** 
+ - In this flow, we can use the built-in SparkSession `spark` instead of `carbon`.
+   We also can create a new SparkSession instead of the built-in SparkSession `spark` if
need. 
+   It need to add "org.apache.spark.sql.CarbonExtensions" into spark configuration "spark.sql.extensions".

+   ```
+   SparkSession newSpark = SparkSession
+     .builder()
+     .config(sc.getConf)
+     .enableHiveSupport
+     .config("spark.sql.extensions","org.apache.spark.sql.CarbonExtensions")
+     .getOrCreate()
+   ```
+ - Data storage location can be specified by "spark.sql.warehouse.dir".
+
 #### Executing Queries
 
 ###### Creating a Table
@@ -114,6 +188,17 @@ carbon.sql(
               | STORED AS carbondata
            """.stripMargin)
 ```
+**NOTE**: 
+The following table list all supported syntax:
+
+|create table |SparkSession with CarbonExtensions | CarbonSession|
+|---|---|---|
+| STORED AS carbondata|yes|yes|
+| USING carbondata|yes|yes|
+| STORED BY 'carbondata'|no|yes|
+| STORED BY 'org.apache.carbondata.format'|no|yes|
+
+We suggest to use CarbonExtensions instead of CarbonSession.
 
 ###### Loading Data to a Table
 


Mime
View raw message