carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hexiaoq...@apache.org
Subject [1/2] incubator-carbondata git commit: corrected the steps in quick start and installation guides.Synchronized the variable names to make it consistent
Date Thu, 09 Mar 2017 13:08:23 GMT
Repository: incubator-carbondata
Updated Branches:
  refs/heads/master fca86fe93 -> 8fef247a9


corrected the steps in quick start and installation guides.Synchronized the variable names
to make it consistent

modified as per review comments


Project: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/commit/b288dd52
Tree: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/tree/b288dd52
Diff: http://git-wip-us.apache.org/repos/asf/incubator-carbondata/diff/b288dd52

Branch: refs/heads/master
Commit: b288dd525414ff65018ccda86ac8c1b67163f293
Parents: fca86fe
Author: sraghunandan <carbondatacontributions@gmail.com>
Authored: Mon Feb 27 08:01:10 2017 +0530
Committer: hexiaoqiao <hexiaoqiao@meituan.com>
Committed: Thu Mar 9 21:03:10 2017 +0800

----------------------------------------------------------------------
 docs/installation-guide.md | 138 ++++++++++++++++++++++------------------
 docs/quick-start-guide.md  |  33 +++++-----
 2 files changed, 92 insertions(+), 79 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/b288dd52/docs/installation-guide.md
----------------------------------------------------------------------
diff --git a/docs/installation-guide.md b/docs/installation-guide.md
index d8f1b5e..c5bf6df 100644
--- a/docs/installation-guide.md
+++ b/docs/installation-guide.md
@@ -40,42 +40,46 @@ followed by :
 
 ### Procedure
 
-* [Build the CarbonData](https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration)
project and get the assembly jar from "./assembly/target/scala-2.10/carbondata_xxx.jar" and
put in the ``"<SPARK_HOME>/carbonlib"`` folder.
+1. [Build the CarbonData](https://github.com/apache/incubator-carbondata/blob/master/build/README.md)
project and get the assembly jar from `./assembly/target/scala-2.1x/carbondata_xxx.jar`. 
 
-     NOTE: Create the carbonlib folder if it does not exists inside ``"<SPARK_HOME>"``
path.
+2. Copy `./assembly/target/scala-2.1x/carbondata_xxx.jar` to `$SPARK_HOME/carbonlib` folder.
 
-* Add the carbonlib folder path in the Spark classpath. (Edit ``"<SPARK_HOME>/conf/spark-env.sh"``
file and modify the value of SPARK_CLASSPATH by appending ``"<SPARK_HOME>/carbonlib/*"``
to the existing value)
+     **NOTE**: Create the carbonlib folder if it does not exist inside `$SPARK_HOME` path.
 
-* Copy the carbon.properties.template to ``"<SPARK_HOME>/conf/carbon.properties"``
folder from "./conf/" of CarbonData repository.
+3. Add the carbonlib folder path in the Spark classpath. (Edit `$SPARK_HOME/conf/spark-env.sh`
file and modify the value of `SPARK_CLASSPATH` by appending `$SPARK_HOME/carbonlib/*` to the
existing value)
 
-* Copy the "carbonplugins" folder  to ``"<SPARK_HOME>/carbonlib"`` folder from "./processing/"
folder of CarbonData repository.
+4. Copy the `./conf/carbon.properties.template` file from CarbonData repository to `$SPARK_HOME/conf/`
folder and rename the file to `carbon.properties`.
 
-    NOTE: carbonplugins will contain .kettle folder.
+5. Copy the `./processing/carbonplugins` folder from CarbonData repository to `$SPARK_HOME/carbonlib/`
folder.
+
+    **NOTE**: carbonplugins will contain .kettle folder.
+
+6. Repeat Step 2 to Step 5 in all the nodes of the cluster.
     
-* In Spark node, configure the properties mentioned in the following table in ``"<SPARK_HOME>/conf/spark-defaults.conf"``
file.
+7. In Spark node[master], configure the properties mentioned in the following table in `$SPARK_HOME/conf/spark-defaults.conf`
file.
 
-| Property | Value | Description |
-|---------------------------------|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
-| carbon.kettle.home | $SPARK_HOME /carbonlib/carbonplugins | Path that will be used by CarbonData
internally to create graph for loading the data |
-| spark.driver.extraJavaOptions | -Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties
| A string of extra JVM options to pass to the driver. For instance, GC settings or other
logging. |
-| spark.executor.extraJavaOptions | -Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties
| A string of extra JVM options to pass to executors. For instance, GC settings or other logging.
NOTE: You can enter multiple values separated by space. |
+   | Property | Value | Description |
+   |---------------------------------|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
+   | carbon.kettle.home | `$SPARK_HOME/carbonlib/carbonplugins` | Path that will be used
by CarbonData internally to create graph for loading the data |
+   | spark.driver.extraJavaOptions | `-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties`
| A string of extra JVM options to pass to the driver. For instance, GC settings or other
logging. |
+   | spark.executor.extraJavaOptions | `-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties`
| A string of extra JVM options to pass to executors. For instance, GC settings or other logging.
**NOTE**: You can enter multiple values separated by space. |
 
-* Add the following properties in ``"<SPARK_HOME>/conf/" carbon.properties``:
+8. Add the following properties in `$SPARK_HOME/conf/carbon.properties` file:
 
-| Property             | Required | Description                                         
                                  | Example                             | Remark  |
-|----------------------|----------|----------------------------------------------------------------------------------------|-------------------------------------|---------|
-| carbon.storelocation | NO       | Location where data CarbonData will create the store
and write the data in its own format. | hdfs://HOSTNAME:PORT/Opt/CarbonStore      | Propose
to set HDFS directory |
-| carbon.kettle.home   | YES      | Path that will be used by CarbonData internally to create
graph for loading the data.         | $SPARK_HOME/carbonlib/carbonplugins |         |
+   | Property             | Required | Description                                      
                                     | Example                             | Remark  |
+   |----------------------|----------|----------------------------------------------------------------------------------------|-------------------------------------|---------|
+   | carbon.storelocation | NO       | Location where data CarbonData will create the store
and write the data in its own format. | hdfs://HOSTNAME:PORT/Opt/CarbonStore      | Propose
to set HDFS directory |
+   | carbon.kettle.home   | YES      | Path that will be used by CarbonData internally to
create graph for loading the data.         | `$SPARK_HOME/carbonlib/carbonplugins` |     
   |
 
 
-* Verify the installation. For example:
+9. Verify the installation. For example:
 
-```
+   ```
    ./spark-shell --master spark://HOSTNAME:PORT --total-executor-cores 2
    --executor-memory 2G
-```
+   ```
 
-NOTE: Make sure you have permissions for CarbonData JARs and files through which driver and
executor will start.
+**NOTE**: Make sure you have permissions for CarbonData JARs and files through which driver
and executor will start.
 
 To get started with CarbonData : [Quick Start](quick-start-guide.md), [DDL Operations on
CarbonData](ddl-operation-on-carbondata.md)
 
@@ -92,77 +96,87 @@ To get started with CarbonData : [Quick Start](quick-start-guide.md),
[DDL Opera
 
    The following steps are only for Driver Nodes. (Driver nodes are the one which starts
the spark context.)
 
-* [Build the CarbonData](https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration)
project and get the assembly jar from "./assembly/target/scala-2.10/carbondata_xxx.jar" and
put in the ``"<SPARK_HOME>/carbonlib"`` folder.
+1. [Build the CarbonData](https://github.com/apache/incubator-carbondata/blob/master/build/README.md)
project and get the assembly jar from `./assembly/target/scala-2.1x/carbondata_xxx.jar` and
copy to `$SPARK_HOME/carbonlib` folder.
 
-      NOTE: Create the carbonlib folder if it does not exists inside ``"<SPARK_HOME>"``
path.
+    **NOTE**: Create the carbonlib folder if it does not exists inside `$SPARK_HOME` path.
 
-* Copy "carbonplugins" folder to ``"<SPARK_HOME>/carbonlib"`` folder from "./processing/"
folder of CarbonData repository.
-      carbonplugins will contain .kettle folder.
+2. Copy the `./processing/carbonplugins` folder from CarbonData repository to `$SPARK_HOME/carbonlib/`
folder.
 
-* Copy the "carbon.properties.template" to ``"<SPARK_HOME>/conf/carbon.properties"``
folder from conf folder of CarbonData repository.
-* Modify the parameters in "spark-default.conf" located in the ``"<SPARK_HOME>/conf``"
+    **NOTE**: carbonplugins will contain .kettle folder.
 
-| Property | Description | Value |
-|---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
-| spark.master | Set this value to run the Spark in yarn cluster mode. | Set "yarn-client"
to run the Spark in yarn cluster mode. |
-| spark.yarn.dist.files | Comma-separated list of files to be placed in the working directory
of each executor. |``"<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties`` |
-| spark.yarn.dist.archives | Comma-separated list of archives to be extracted into the working
directory of each executor. |``"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbondata_xxx.jar``
|
-| spark.executor.extraJavaOptions | A string of extra JVM options to pass to executors. For
instance  NOTE: You can enter multiple values separated by space. |``-Dcarbon.properties.filepath="<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties``
|
-| spark.executor.extraClassPath | Extra classpath entries to prepend to the classpath of
executors. NOTE: If SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append
the values in below parameter spark.driver.extraClassPath |``"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbonlib/carbondata_xxx.jar``
|
-| spark.driver.extraClassPath | Extra classpath entries to prepend to the classpath of the
driver. NOTE: If SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append the
value in below parameter spark.driver.extraClassPath. |``"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbonlib/carbondata_xxx.jar``
|
-| spark.driver.extraJavaOptions | A string of extra JVM options to pass to the driver. For
instance, GC settings or other logging. |``-Dcarbon.properties.filepath="<YOUR_SPARK_HOME_PATH>"/conf/carbon.properties``
|
-| carbon.kettle.home | Path that will be used by CarbonData internally to create graph for
loading the data. |``"<YOUR_SPARK_HOME_PATH>"/carbonlib/carbonplugins`` |
+3. Copy the `./conf/carbon.properties.template` file from CarbonData repository to `$SPARK_HOME/conf/`
folder and rename the file to `carbon.properties`.
 
-* Add the following properties in ``<SPARK_HOME>/conf/ carbon.properties``:
+4. Create `tar,gz` file of carbonlib folder and move it inside the carbonlib folder.
 
-| Property | Required | Description | Example | Default Value |
-|----------------------|----------|----------------------------------------------------------------------------------------|-------------------------------------|---------------|
-| carbon.storelocation | NO | Location where CarbonData will create the store and write the
data in its own format. | hdfs://HOSTNAME:PORT/Opt/CarbonStore | Propose to set HDFS directory|
-| carbon.kettle.home | YES | Path that will be used by CarbonData internally to create graph
for loading the data. | $SPARK_HOME/carbonlib/carbonplugins |  |
+    ```
+	cd $SPARK_HOME
+	tar -zcvf carbondata.tar.gz carbonlib/
+	mv carbondata.tar.gz carbonlib/
+    ```
 
+5. Configure the properties mentioned in the following table in `$SPARK_HOME/conf/spark-defaults.conf`
file.
 
-* Verify the installation.
+   | Property | Description | Value |
+   |---------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|
+   | spark.master | Set this value to run the Spark in yarn cluster mode. | Set yarn-client
to run the Spark in yarn cluster mode. |
+   | spark.yarn.dist.files | Comma-separated list of files to be placed in the working directory
of each executor. |`$SPARK_HOME/conf/carbon.properties` |
+   | spark.yarn.dist.archives | Comma-separated list of archives to be extracted into the
working directory of each executor. |`$SPARK_HOME/carbonlib/carbondata.tar.gz` |
+   | spark.executor.extraJavaOptions | A string of extra JVM options to pass to executors.
For instance  **NOTE**: You can enter multiple values separated by space. |`-Dcarbon.properties.filepath=carbon.properties`
|
+   | spark.executor.extraClassPath | Extra classpath entries to prepend to the classpath
of executors. **NOTE**: If SPARK_CLASSPATH is defined in spark-env.sh, then comment it and
append the values in below parameter spark.driver.extraClassPath |`carbondata.tar.gz/carbonlib/*`
|
+   | spark.driver.extraClassPath | Extra classpath entries to prepend to the classpath of
the driver. **NOTE**: If SPARK_CLASSPATH is defined in spark-env.sh, then comment it and append
the value in below parameter spark.driver.extraClassPath. |`$SPARK_HOME/carbonlib/carbonlib/*`
|
+   | spark.driver.extraJavaOptions | A string of extra JVM options to pass to the driver.
For instance, GC settings or other logging. |`-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties`
|
 
-```
+
+6. Add the following properties in `$SPARK_HOME/conf/carbon.properties`:
+
+   | Property | Required | Description | Example | Default Value |
+   |----------------------|----------|----------------------------------------------------------------------------------------|-------------------------------------|---------------|
+   | carbon.storelocation | NO | Location where CarbonData will create the store and write
the data in its own format. | hdfs://HOSTNAME:PORT/Opt/CarbonStore | Propose to set HDFS directory|
+   | carbon.kettle.home | YES | Path that will be used by CarbonData internally to create
graph for loading the data. | carbondata.tar.gz/carbonlib/carbonplugins |  |
+
+
+7. Verify the installation.
+
+   ```
      ./bin/spark-shell --master yarn-client --driver-memory 1g
      --executor-cores 2 --executor-memory 2G
-```
-  NOTE: Make sure you have permissions for CarbonData JARs and files through which driver
and executor will start.
+   ```
+  **NOTE**: Make sure you have permissions for CarbonData JARs and files through which driver
and executor will start.
 
   Getting started with CarbonData : [Quick Start](quick-start-guide.md), [DDL Operations
on CarbonData](ddl-operation-on-carbondata.md)
 
 ## Query Execution Using CarbonData Thrift Server
 
-### Starting CarbonData Thrift Server
+### Starting CarbonData Thrift Server.
 
-   a. cd ``<SPARK_HOME>``
+   a. cd `$SPARK_HOME`
 
    b. Run the following command to start the CarbonData thrift server.
 
-```
-./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
---class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
-$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
-```
+   ```
+   ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true
+   --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+   $SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR <carbon_store_path>
+   ```
 
 | Parameter | Description | Example |
 |---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
-| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the ``"<SPARK_HOME>"/carbonlib/``
folder. | carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar |
-| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path
where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation
parameter of carbon.properties. | ``hdfs//<host_name>:54310/user/hive/warehouse/carbon.store``
|
+| CARBON_ASSEMBLY_JAR | CarbonData assembly jar name present in the `$SPARK_HOME/carbonlib/`
folder. | carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar |
+| carbon_store_path | This is a parameter to the CarbonThriftServer class. This a HDFS path
where CarbonData files will be kept. Strongly Recommended to put same as carbon.storelocation
parameter of carbon.properties. | `hdfs://<host_name>:port/user/hive/warehouse/carbon.store`
|
 
-### Examples
+**Examples**
    
-   * Start with default memory and executors
+   * Start with default memory and executors.
 
 ```
 ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true 
 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer 
 $SPARK_HOME/carbonlib
 /carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar 
-hdfs://hacluster/user/hive/warehouse/carbon.store
+hdfs://<host_name>:port/user/hive/warehouse/carbon.store
 ```
    
-   * Start with Fixed executors and resources
+   * Start with Fixed executors and resources.
 
 ```
 ./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true 
@@ -171,13 +185,13 @@ hdfs://hacluster/user/hive/warehouse/carbon.store
 --executor-cores 32 
 /srv/OSCON/BigData/HACluster/install/spark/sparkJdbc/lib
 /carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar 
-hdfs://hacluster/user/hive/warehouse/carbon.store
+hdfs://<host_name>:port/user/hive/warehouse/carbon.store
 ```
   
-### Connecting to CarbonData Thrift Server Using Beeline
+### Connecting to CarbonData Thrift Server Using Beeline.
 
 ```
-     cd <SPARK_HOME>
+     cd $SPARK_HOME
      ./bin/beeline jdbc:hive2://<thrftserver_host>:port
 
      Example

http://git-wip-us.apache.org/repos/asf/incubator-carbondata/blob/b288dd52/docs/quick-start-guide.md
----------------------------------------------------------------------
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index e6ef742..c29a8d3 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -24,15 +24,15 @@ This tutorial provides a quick introduction to using CarbonData.
 * [Installation and building CarbonData](https://github.com/apache/incubator-carbondata/blob/master/build).
 * Create a sample.csv file using the following commands. The CSV file is required for loading
data into CarbonData.
 
-```
-cd carbondata
-cat > sample.csv << EOF
-id,name,city,age
-1,david,shenzhen,31
-2,eason,shenzhen,27
-3,jarry,wuhan,35
-EOF
-```
+  ```
+  cd carbondata
+  cat > sample.csv << EOF
+  id,name,city,age
+  1,david,shenzhen,31
+  2,eason,shenzhen,27
+  3,jarry,wuhan,35
+  EOF
+  ```
 
 ## Interactive Analysis with Spark Shell
 
@@ -48,7 +48,7 @@ Start Spark shell by running the following command in the Spark directory:
 ./bin/spark-shell --jars <carbondata assembly jar path>
 ```
 
-In this shell, SparkSession is readily available as 'spark' and Spark context is readily
available as 'sc'.
+In this shell, SparkSession is readily available as `spark` and Spark context is readily
available as `sc`.
 
 In order to create a CarbonSession we will have to configure it explicitly in the following
manner :
 
@@ -64,7 +64,7 @@ import org.apache.spark.sql.CarbonSession._
 ```
 val carbon = SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<hdfs
store path>")
 ```
-NOTE: By default metastore location is pointed to "../carbon.metastore", user can provide
own metastore location to CarbonSession like `SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<hdfs
store path>", "<local metastore path>")`
+**NOTE**: By default metastore location is pointed to `../carbon.metastore`, user can provide
own metastore location to CarbonSession like `SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("<hdfs
store path>", "<local metastore path>")`
 
 #### Executing Queries
 
@@ -79,7 +79,7 @@ scala>carbon.sql("CREATE TABLE IF NOT EXISTS test_table(id string, name
string,
 ```
 scala>carbon.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table")
 ```
-NOTE:Please provide the real file path of sample.csv for the above script.
+**NOTE**: Please provide the real file path of `sample.csv` for the above script.
 
 ###### Query Data from a Table
 
@@ -100,7 +100,7 @@ Start Spark shell by running the following command in the Spark directory:
 ./bin/spark-shell --jars <carbondata assembly jar path>
 ```
 
-NOTE: In this shell, SparkContext is readily available as sc.
+**NOTE**: In this shell, SparkContext is readily available as `sc`.
 
 * In order to execute the Queries we need to import CarbonContext:
 
@@ -111,10 +111,9 @@ import org.apache.spark.sql.CarbonContext
 * Create an instance of CarbonContext in the following manner :
 
 ```
-val cc = new CarbonContext(sc)
+val cc = new CarbonContext(sc, "<hdfs store path>")
 ```
-
-NOTE: By default store location is pointed to "../carbon.store", user can provide own store
location to CarbonContext like new CarbonContext(sc, storeLocation).
+**NOTE**: If running on local machine without hdfs, configure the local machine's store path
instead of hdfs store path
 
 #### Executing Queries
 
@@ -134,7 +133,7 @@ scala>cc.sql("SHOW TABLES").show()
 ```
 scala>cc.sql("LOAD DATA INPATH 'sample.csv file path' INTO TABLE test_table")
 ```
-NOTE:Please provide the real file path of sample.csv for the above script.
+**NOTE**: Please provide the real file path of `sample.csv` for the above script.
 
 ##### Query Data from a Table
 


Mime
View raw message