carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jbono...@apache.org
Subject [17/47] incubator-carbondata-site git commit: Add content for publication
Date Tue, 13 Dec 2016 15:41:42 GMT
http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/a1ad369d/content/documents/installation.html
----------------------------------------------------------------------
diff --git a/content/documents/installation.html b/content/documents/installation.html
new file mode 100644
index 0000000..e308dfe
--- /dev/null
+++ b/content/documents/installation.html
@@ -0,0 +1,282 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style></style></head><body id="preview">
+<p>&lt;!–<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+“License”); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>–&gt;</p>
+<h1><a id="Installation_Guide_19"></a>Installation Guide</h1>
+<p>This tutorial will guide you through the installation and configuration of CarbonData in the following two modes :</p>
+<ul>
+<li><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide">On Standalone Spark cluster</a></li>
+<li><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide">On Spark on Yarn cluster</a></li>
+</ul>
+<p>followed by :</p>
+<ul>
+<li><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide">Query Execution using Carbon Thrift Server</a></li>
+</ul>
+<h2><a id="Installing_and_Configuring_CarbonData_on_Standalone_Spark_Cluster_28"></a>Installing and Configuring CarbonData on “Standalone Spark” Cluster</h2>
+<h3><a id="Prerequisite_30"></a>Prerequisite</h3>
+<ul>
+<li>Hadoop HDFS and Yarn should be installed and running.</li>
+<li>Spark should be installed and running in all the clients.</li>
+<li>CarbonData user should have permission to access HDFS.</li>
+</ul>
+<h3><a id="Procedure_35"></a>Procedure</h3>
+<p>The following steps are only for Driver Nodes.(Driver nodes are the one which starts the spark context.)</p>
+<ol>
+<li>
+<p><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration">Build the CarbonData</a> project and get the assembly jar from “./assembly/target/scala-2.10/carbondata_xxx.jar” and put in the “&lt;SPARK_HOME&gt;/carbonlib” folder.</p>
+<p>(Note: - Create the carbonlib folder if does not exists inside “&lt;SPARK_HOME&gt;” path.)</p>
+</li>
+<li>
+<p>carbonlib folder path must be added in Spark classpath. (Edit “&lt;SPARK_HOME&gt;/conf/spark-env.sh” file and modify the value of SPARK_CLASSPATH by appending “&lt;SPARK_HOME&gt;/carbonlib/*” to the existing value)</p>
+</li>
+<li>
+<p>Copy the carbon.properties.template to “&lt;SPARK_HOME&gt;/conf/carbon.properties” folder from “./conf/” of CarbonData repository.</p>
+</li>
+<li>
+<p>Copy “carbonplugins” folder  to “&lt;SPARK_HOME&gt;/carbonlib” folder from “./processing/” folder of CarbonData repository.</p>
+<p>(Note: -carbonplugins will contain .kettle folder.)</p>
+</li>
+<li>
+<p>In Spark node, configure the properties mentioned as the below table in “&lt;SPARK_HOME&gt;/conf/spark-defaults.conf” file</p>
+</li>
+</ol>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Property</th>
+<th>Description</th>
+<th>Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>carbon.kettle.home</td>
+<td>Path that will be used by CarbonData internally to create graph for loading the data</td>
+<td>$SPARK_HOME /carbonlib/carbonplugins</td>
+</tr>
+<tr>
+<td>spark.driver.extraJavaOptions</td>
+<td>A string of extra JVM options to pass to the driver. For instance, GC settings or other logging.</td>
+<td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td>
+</tr>
+<tr>
+<td>spark.executor.extraJavaOptions</td>
+<td>A string of extra JVM options to pass to executors. For instance, GC settings or other logging. NOTE: You can enter multiple values separated by space.</td>
+<td>-Dcarbon.properties.filepath=$SPARK_HOME/conf/carbon.properties</td>
+</tr>
+</tbody>
+</table>
+<ol start="6">
+<li>Add the following properties in “&lt;SPARK_HOME&gt;/conf/” carbon.properties:</li>
+</ol>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Property</th>
+<th>Required</th>
+<th>Description</th>
+<th>Example</th>
+<th>Remark</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>carbon.storelocation</td>
+<td>NO</td>
+<td>Location where data Carbon will create the store and write the data in its own format.</td>
+<td>hdfs://IP:PORT/Opt/CarbonStore</td>
+<td>Propose</td>
+</tr>
+<tr>
+<td>carbon.kettle.home</td>
+<td>YES</td>
+<td>Path that will used by Carbon internally to create graph for loading the data.</td>
+<td>$SPARK_HOME/carbonlib/carbonplugins</td>
+<td></td>
+</tr>
+</tbody>
+</table>
+<ol start="7">
+<li>Installation verification,for example:</li>
+</ol>
+<pre><code>./spark-shell --master spark://IP:PORT --total-executor-cores 2 --executor-memory 2G
+</code></pre>
+<p>Note: Make sure that user should have permission of carbon jars and files through which driver and executor will start.</p>
+<p>To get started with CarbonData : <a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start">Quick Start</a> ,<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/DDL+operations+on+CarbonData">DDL Operations</a></p>
+<h2><a id="Installing_and_Configuring_Carbon_on_Spark_on_YARN_Cluster_72"></a>Installing and Configuring Carbon on “Spark on YARN” Cluster</h2>
+<p>This section provides the procedure to install Carbon on “Spark on YARN” cluster.</p>
+<h3><a id="Prerequisite_75"></a>Prerequisite</h3>
+<ul>
+<li>Hadoop HDFS and Yarn should be installed and running.</li>
+<li>Spark should be installed and running in all the clients.</li>
+<li>CarbonData user should have permission to access HDFS.</li>
+</ul>
+<h3><a id="Procedure_80"></a>Procedure</h3>
+<ol>
+<li>
+<p><a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Building+CarbonData+And+IDE+Configuration">Build the CarbonData</a> project and get the assembly jar from “./assembly/target/scala-2.10/carbondata_xxx.jar” and put in the “&lt;SPARK_HOME&gt;/carbonlib” folder.</p>
+<p>(Note: - Create the carbonlib folder if does not exists inside “&lt;SPARK_HOME&gt;” path.)</p>
+</li>
+<li>
+<p>Copy the carbon.properties.template to “&lt;SPARK_HOME&gt;/conf/carbon.properties” folder from “./conf/” of CarbonData repository.<br>
+carbonplugins will contain .kettle folder.</p>
+</li>
+<li>
+<p>Copy the “carbon.properties.template” to “&lt;SPARK_HOME&gt;/conf/carbon.properties” folder from conf folder of carbondata repository.</p>
+</li>
+<li>
+<p>Modify the parameters in “spark-default.conf” located in the “&lt;SPARK_HOME&gt;/conf”</p>
+</li>
+</ol>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Property</th>
+<th>Description</th>
+<th>Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>spark.master</td>
+<td>Set this value to run the Spark in yarn cluster mode.</td>
+<td>Set “yarn-client” to run the Spark in yarn cluster mode.</td>
+</tr>
+<tr>
+<td>spark.yarn.dist.files</td>
+<td>Comma-separated list of files to be placed in the working directory of each executor.</td>
+<td>“&lt;YOUR_SPARK_HOME_PATH&gt;”/conf/carbon.properties</td>
+</tr>
+<tr>
+<td>spark.yarn.dist.archives</td>
+<td>Comma-separated list of archives to be extracted into the working directory of each executor.</td>
+<td>“&lt;YOUR_SPARK_HOME_PATH&gt;”/carbonlib/carbondata_xxx.jar</td>
+</tr>
+<tr>
+<td>spark.executor.extraJavaOptions</td>
+<td>A string of extra JVM options to pass to executors. For instance  NOTE: You can enter multiple values separated by space.</td>
+<td>-Dcarbon.properties.filepath=carbon.properties</td>
+</tr>
+<tr>
+<td>spark.executor.extraClassPath</td>
+<td>Extra classpath entries to prepend to the classpath of executors. NOTE: If SPARK_CLASSPATH is defined in <a href="http://spark-env.sh">spark-env.sh</a>, then comment it and append the values in below parameter spark.driver.extraClassPath</td>
+<td>“&lt;YOUR_SPARK_HOME_PATH&gt;”/carbonlib/carbonlib/carbondata_xxx.jar</td>
+</tr>
+<tr>
+<td>spark.driver.extraClassPath</td>
+<td>Extra classpath entries to prepend to the classpath of the driver. NOTE: If SPARK_CLASSPATH is defined in <a href="http://spark-env.sh">spark-env.sh</a>, then comment it and append the value in below parameter spark.driver.extraClassPath.</td>
+<td>“&lt;YOUR_SPARK_HOME_PATH&gt;”/carbonlib/carbonlib/carbondata_xxx.jar</td>
+</tr>
+<tr>
+<td>spark.driver.extraJavaOptions</td>
+<td>A string of extra JVM options to pass to the driver. For instance, GC settings or other logging.</td>
+<td>-Dcarbon.properties.filepath=&quot;&lt;YOUR_SPARK_HOME_PATH&gt;&quot;/conf/carbon.properties</td>
+</tr>
+<tr>
+<td>carbon.kettle.home</td>
+<td>Path that will used by Carbon internally to create graph for loading the data.</td>
+<td>“&lt;YOUR_SPARK_HOME_PATH&gt;”/carbonlib/carbonplugins</td>
+</tr>
+</tbody>
+</table>
+<ol start="5">
+<li>Add the following properties in &lt;SPARK_HOME&gt;/conf/ carbon.properties:</li>
+</ol>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Property</th>
+<th>Required</th>
+<th>Description</th>
+<th>Example</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>carbon.storelocation</td>
+<td>NO</td>
+<td>Location where data Carbon will create the store and write the data in its own format.</td>
+<td>hdfs://IP:PORT/Opt/CarbonStore</td>
+<td>Propose</td>
+</tr>
+<tr>
+<td>carbon.kettle.home</td>
+<td>YES</td>
+<td>Path that will used by Carbon internally to create graph for loading the data.</td>
+<td>$SPARK_HOME/carbonlib/carbonplugins</td>
+<td></td>
+</tr>
+</tbody>
+</table>
+<ol start="6">
+<li>Installation verification</li>
+</ol>
+<pre><code>./bin/spark-shell --master yarn-client --driver-memory 1g --executor-cores 2 --executor-memory 2G 
+</code></pre>
+<p>Note: Make sure that user should have permission of carbon jars and files through which driver and executor will start.</p>
+<p>To get started with CarbonData : <a href="https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start">Quick Start</a> ,<a href="https://cwiki.apache.org/confluence/display/CARBONDATA/DDL+operations+on+CarbonData">DDL Operations</a></p>
+<h2><a id="Query_execution_using_Carbon_thrift_server_118"></a>Query execution using Carbon thrift server</h2>
+<h3><a id="Start_Thrift_server_120"></a>Start Thrift server</h3>
+<p>a. cd &lt;SPARK_HOME&gt;</p>
+<p>b. Run below command to start the Carbon thrift server</p>
+<pre><code>./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer
+$SPARK_HOME/carbonlib/$CARBON_ASSEMBLY_JAR &lt;carbon_store_path&gt;
+</code></pre>
+<table class="table table-striped table-bordered">
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Description</th>
+<th>Example</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>CARBON_ASSEMBLY_JAR</td>
+<td>Carbon assembly jar name present in the “”/carbonlib/ folder.</td>
+<td>carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar</td>
+</tr>
+<tr>
+<td>carbon_store_path</td>
+<td>This is parameter to the CarbonThriftServer class. This a HDFS path where carbon files will be kept. Strongly Recommended to put same as carbon.storelocation parameter of carbon.proeprties.</td>
+<td>hdfs//hacluster/user/hive/warehouse/carbon.storehdfs//10.10.10.10:54310 /user/hive/warehouse/carbon.store</td>
+</tr>
+</tbody>
+</table>
+<h3><a id="Examples_133"></a>Examples</h3>
+<ol>
+<li>Start with default memory and executors</li>
+</ol>
+<pre><code>./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer $SPARK_HOME/carbonlib/carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar hdfs://hacluster/user/hive/warehouse/carbon.store
+</code></pre>
+<ol start="2">
+<li>Start with Fixed executors and resources</li>
+</ol>
+<pre><code>./bin/spark-submit --conf spark.sql.hive.thriftServer.singleSession=true --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer --num-executors 3 --driver-memory 20g --executor-memory 250g --executor-cores 32 /srv/OSCON/BigData/HACluster/install/spark/sparkJdbc/lib/carbondata_2.10-0.1.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar hdfs://hacluster/user/hive/warehouse/carbon.store
+</code></pre>
+<h3><a id="Connecting_to_Carbon_Thrift_Server_Using_Beeline_142"></a>Connecting to Carbon Thrift Server Using Beeline</h3>
+<pre><code>cd &lt;SPARK_HOME&gt;
+./bin/beeline jdbc:hive2://&lt;thrftserver_host&gt;:port
+ 
+Example 
+./bin/beeline jdbc:hive2://10.10.10.10:10000
+</code></pre>
+
+</body></html>

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/a1ad369d/content/documents/overview.html
----------------------------------------------------------------------
diff --git a/content/documents/overview.html b/content/documents/overview.html
new file mode 100644
index 0000000..c23b883
--- /dev/null
+++ b/content/documents/overview.html
@@ -0,0 +1,184 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style>
+p img{ max-width: 100% }
+</style></head><body id="preview">
+<!--<p>&lt;!–-<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+“License”); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>-–&gt;</p>-->
+<p><img src="../docs/images/format/CarbonData_logo.png?raw=true" alt="CarbonData_Logo"></p>
+<p>This tutorial provides a detailed overview about :</p>
+<ul>
+<li>CarbonData,</li>
+<li>Working and File Format</li>
+<li>Features</li>
+<li>Supported Data Types</li>
+<li>Compatibility</li>
+<li>Packaging and Interfaces.</li>
+</ul>
+<h2><a id="Introduction_30"></a>Introduction</h2>
+<p>CarbonData is a fully indexed columnar and Hadoop native data-store for processing heavy analytical workloads and detailed queries on big data. CarbonData allows  faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, in turn it will help speedup queries an order of magnitude faster over PetaBytes of data.</p>
+<p>In customer benchmarks, CarbonData has proven to manage Petabyte of data running on extraordinarily low-cost hardware and answers queries around 10 times faster than the current open source solutions (column-oriented SQL on Hadoop data-stores).</p>
+<p>Some of the Salient features of CarbonData are :</p>
+<ul>
+<li>Low-Latency for various types of data access patterns like Sequential,Random and OLAP.</li>
+<li>Allows fast query on fast data.</li>
+<li>Ensures Space Efficiency.</li>
+<li>General format available on Hadoop-ecosystem.</li>
+</ul>
+<h2><a id="CarbonData_File_Structure_42"></a>CarbonData File Structure</h2>
+<p>CarbonData file contains groups of data called blocklet, along with all required information like schema, offsets and indices, etc, in a file footer, co-located in HDFS.</p>
+<p>The file footer can be read once to build the indices in memory, which can be utilized for optimizing the scans and processing for all subsequent queries.</p>
+<p>Each blocklet in the file is further divided into chunks of data called Data Chunks. Each data chunk is organized either in columnar format or row format, and stores the data of either a single column or a set of columns. All blocklets in one file contain the same number and type of Data Chunks.</p>
+<p><img src="../docs/images/format/carbon_data_file_structure_new.png?raw=true" alt="Carbon File Structure"></p>
+<p>Each Data Chunk contains multiple groups of data called as Pages. There are three types of pages.</p>
+<ul>
+<li>Data Page: Contains the encoded data of a column/group of columns.</li>
+<li>Row ID Page (optional): Contains the row id mappings used when the Data Page is stored as an inverted index.</li>
+<li>RLE Page (optional): Contains additional metadata used when the Data Page in RLE coded.</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_format_new.png?raw=true" alt="Carbon File Format"></p>
+<h2><a id="Features_59"></a>Features</h2>
+<p>CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema ,complex data type etc, and CarbonData has following unique features:</p>
+<ul>
+<li>Stores data along with index: it can significantly accelerate query performance and reduces the I/O scans and CPU resources, where there are filters in the query. CarbonData index consists of multiple level of indices, a processing framework can leverage this index to reduce the task it needs to schedule and process, and it can also do skip scan in more finer grain unit (called blocklet) in task side scanning instead of scanning the whole file.</li>
+<li>Operable encoded data :Through supporting efficient compression and global encoding schemes, can query on compressed/encoded data, the data can be converted just before returning the results to the users, which is “late materialized”.</li>
+<li>Column group: Allow multiple columns to form a column group that would be stored as row format. This reduces the row reconstruction cost at query time.</li>
+<li>Supports for various use cases with one single Data format : like interactive OLAP-style query, Sequential Access (big scan), Random Access (narrow scan).</li>
+</ul>
+<h2><a id="Data_Types_68"></a>Data Types</h2>
+<p>The following types are supported :</p>
+<ul>
+<li>
+<p>Numeric Types</p>
+<ul>
+<li>SMALLINT</li>
+<li>INT/INTEGER</li>
+<li>BIGINT</li>
+<li>DOUBLE</li>
+<li>DECIMAL</li>
+</ul>
+</li>
+<li>
+<p>Date/Time Types</p>
+<ul>
+<li>TIMESTAMP</li>
+</ul>
+</li>
+<li>
+<p>String Types</p>
+<ul>
+<li>STRING</li>
+</ul>
+</li>
+<li>
+<p>Complex Types</p>
+<ul>
+<li>arrays: ARRAY&lt;data_type&gt;</li>
+<li>structs: STRUCT&lt;col_name : data_type [COMMENT col_comment], …&gt;</li>
+</ul>
+</li>
+</ul>
+<h2><a id="Compatibility_89"></a>Compatibility</h2>
+<h2><a id="Packaging_and_Interfaces_92"></a>Packaging and Interfaces</h2>
+<h3><a id="Packaging_94"></a>Packaging</h3>
+<p>Carbon provides following JAR packages:</p>
+<p><img src="https://cloud.githubusercontent.com/assets/6500698/14255195/831c6e90-fac5-11e5-87ab-3b16d84918fb.png" alt="carbon modules2"></p>
+<ul>
+<li>
+<p><strong>carbon-store.jar or carbondata-assembly.jar:</strong> This is the main Jar for carbon project, the target user of it are both user and developer.<br>
+- For MapReduce application users, this jar provides API to read and write carbon files through CarbonInput/OutputFormat in carbon-hadoop module.<br>
+- For developer, this jar can be used to integrate carbon with processing engine like spark and hive, by leveraging API in carbon-processing module.</p>
+</li>
+<li>
+<p><strong>carbon-spark.jar(Currently it is part of assembly jar):</strong> provides support for spark user, spark user can manipulate carbon data files by using native spark DataFrame/SQL interface. Apart from this, in order to leverage carbon’s builtin lifecycle management function, higher level concept like Managed Carbon Table, Database and corresponding DDL are introduced.</p>
+</li>
+<li>
+<p><strong>carbon-hive.jar(not yet provided):</strong> similar to carbon-spark, which provides integration to carbon and hive.</p>
+</li>
+</ul>
+<h3><a id="Interfaces_107"></a>Interfaces</h3>
+<h4><a id="API_109"></a>API</h4>
+<p>Carbon can be used in following scenarios:</p>
+<ol>
+<li>
+<p>For MapReduce application user<br>
+This User API is provided by carbon-hadoop. In this scenario, user can process carbon files in his MapReduce application by choosing CarbonInput/OutputFormat, and is responsible using it correctly.Currently only CarbonInputFormat is provided and OutputFormat will be provided soon.</p>
+</li>
+<li>
+<p>For Spark user<br>
+This User API is provided by the Spark itself. There are also two levels of APIs</p>
+<ul>
+<li>
+<p><strong>Carbon File</strong></p>
+<p>Similar to parquet, json, or other data source in Spark, carbon can be used with data source API. For example(please refer to DataFrameAPIExample for the more detail):</p>
+<pre><code>// User can create a DataFrame from any data source or transformation.
+val df = ...
+
+// Write data
+// User can write a DataFrame to a carbon file
+df.write
+.format(&quot;carbondata&quot;)
+.option(&quot;tableName&quot;, &quot;carbontable&quot;)
+.mode(SaveMode.Overwrite)
+.save()
+
+
+// read carbon data by data source API
+df = carbonContext.read
+.format(&quot;carbondata&quot;)
+.option(&quot;tableName&quot;, &quot;carbontable&quot;)
+.load(&quot;/path&quot;)
+
+// User can then use DataFrame for analysis
+df.count
+SVMWithSGD.train(df, numIterations)
+
+// User can also register the DataFrame with a table name, and use SQL for analysis
+df.registerTempTable(&quot;t1&quot;)  // register temporary table in SparkSQL catalog
+df.registerHiveTable(&quot;t2&quot;)  // Or, use a implicit funtion to register to Hive metastore
+sqlContext.sql(&quot;select count(*) from t1&quot;).show
+</code></pre>
+</li>
+<li>
+<p><strong>Managed Carbon Table</strong></p>
+<p>Carbon has in built support for high level concept like Table, Database, and supports full data lifecycle management, instead of dealing with just files, user can use carbon specific DDL to manipulate data in Table and Database level. Please refer <a href="https://github.com/HuaweiBigData/carbondata/wiki/Language-Manual:-DDL">DDL</a> and <a href="https://github.com/HuaweiBigData/carbondata/wiki/Language-Manual:-DML">DML</a></p>
+<pre><code>// Use SQL to manage table and query data
+create database db1;
+use database db1;
+show databases;
+create table tbl1 using org.apache.carbondata.spark;
+load data into table tlb1 path 'some_files';
+select count(*) from tbl1;
+</code></pre>
+</li>
+</ul>
+</li>
+<li>
+<p>For developer who want to integrate carbon into a processing engines like spark,hive or flink, use API provided by carbon-hadoop and carbon-processing:</p>
+<ul>
+<li>
+<p><strong>Query</strong> : integrate carbon-hadoop with engine specific API, like spark data source API</p>
+</li>
+<li>
+<p><strong>Data life cycle management</strong> : carbon provides utility functions in carbon-processing to manage data life cycle, like data loading, compact, retention, schema evolution. Developer can implement DDLs of their choice and leverage these utility function to do data life cycle management.</p>
+</li>
+</ul>
+</li>
+</ol>
+
+</body></html>

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/a1ad369d/content/documents/overviewdashboardpages.html
----------------------------------------------------------------------
diff --git a/content/documents/overviewdashboardpages.html b/content/documents/overviewdashboardpages.html
new file mode 100644
index 0000000..c38640b
--- /dev/null
+++ b/content/documents/overviewdashboardpages.html
@@ -0,0 +1,282 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
+    <title>CarbonData</title>
+
+    <!-- Bootstrap -->
+	
+    <link rel="stylesheet" href="css/bootstrap.min.css">
+    <link href="css/style.css" rel="stylesheet">    	
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
+    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
+    <!--[if lt IE 9]>
+      <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
+      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
+    <![endif]-->
+  </head>
+  <body>
+    <header>
+     <nav class="navbar navbar-default navbar-custom cd-navbar-wrapper" >
+      <div class="container">
+        <div class="navbar-header">
+          <button aria-controls="navbar" aria-expanded="false" data-target="#navbar" data-toggle="collapse" class="navbar-toggle collapsed" type="button">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a href="index.html" class="logo">
+             <img src="images/CarbonDataLogo.png" alt="CarbonData logo" title="CarbocnData logo"  />      
+          </a>
+        </div>
+        <div class="navbar-collapse collapse cd_navcontnt" id="navbar">         
+          <ul class="nav navbar-nav navbar-right navlist-custom">
+              <li><a href="index.html"><i class="fa fa-home" aria-hidden="true"></i> </a></li>
+              <li><a href="#">Download  </a></li>
+              <li><a href="#">OverView </a></li>
+              <li><a href="dashboard.html" class="active" target="blank">Documents </a></li>
+              <li><a href="#">Community </a></li>
+              <li><a href="#" class="apache_link">apache</a>
+           </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+     </header> <!-- end Header part -->
+   
+   <div class="fixed-padding"></div> <!--  top padding with fixde header  -->
+ 
+   <section><!-- Dashboard nav -->
+    <div class="container-fluid">
+      <div class="row">
+        <div class="col-sm-3 col-md-2 sidebar">
+          <ul class="nav nav-sidebar">
+            <li class="active"><a href="#">Overview <span class="sr-only">(current)</span></a></li>
+            <li><a href="#">Contributing to CarbonData</a></li>
+            <li><a href="#">Quick start</a></li>
+            <li><a href="#">User Guide</a></li>
+            <li><a href="#">Using CarbonData</a></li>
+            <li><a href="#">FAQ</a></li>
+          </ul>        
+        </div>
+        <div class="col-sm-9 col-sm-offset-3 col-md-10 col-md-offset-2 maindashboard">
+           <div class="row placeholders">            
+            <section>
+              <div style="padding:40px;">
+
+                <p>This tutorial provides a detailed overview about :</p>
+<ul>
+<li>CarbonData,</li>
+<li>Working and File Format</li>
+<li>Features</li>
+<li>Supported Data Types</li>
+<li>Compatibility</li>
+<li>Packaging and Interfaces.</li>
+</ul>
+<h2><a id="Introduction_30"></a>Introduction</h2>
+<p>CarbonData is a fully indexed columnar and Hadoop native data-store for processing heavy analytical workloads and detailed queries on big data. CarbonData allows  faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, in turn it will help speedup queries an order of magnitude faster over PetaBytes of data.</p>
+<p>In customer benchmarks, CarbonData has proven to manage Petabyte of data running on extraordinarily low-cost hardware and answers queries around 10 times faster than the current open source solutions (column-oriented SQL on Hadoop data-stores).</p>
+<p>Some of the Salient features of CarbonData are :</p>
+<ul>
+<li>Low-Latency for various types of data access patterns like Sequential,Random and OLAP.</li>
+<li>Allows fast query on fast data.</li>
+<li>Ensures Space Efficiency.</li>
+<li>General format available on Hadoop-ecosystem.</li>
+</ul>
+<h2><a id="CarbonData_File_Structure_42"></a>CarbonData File Structure</h2>
+<p>CarbonData file contains groups of data called blocklet, along with all required information like schema, offsets and indices, etc, in a file footer, co-located in HDFS.</p>
+<p>The file footer can be read once to build the indices in memory, which can be utilized for optimizing the scans and processing for all subsequent queries.</p>
+<p>Each blocklet in the file is further divided into chunks of data called Data Chunks. Each data chunk is organized either in columnar format or row format, and stores the data of either a single column or a set of columns. All blocklets in one file contain the same number and type of Data Chunks.</p>
+<p><img src="../docs/images/format/carbon_data_file_structure_new.png?raw=true" alt="Carbon File Structure"></p>
+<p>Each Data Chunk contains multiple groups of data called as Pages. There are three types of pages.</p>
+<ul>
+<li>Data Page: Contains the encoded data of a column/group of columns.</li>
+<li>Row ID Page (optional): Contains the row id mappings used when the Data Page is stored as an inverted index.</li>
+<li>RLE Page (optional): Contains additional metadata used when the Data Page in RLE coded.</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_format_new.png?raw=true" alt="Carbon File Format"></p>
+<h2><a id="Features_59"></a>Features</h2>
+<p>CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema ,complex data type etc, and CarbonData has following unique features:</p>
+<ul>
+<li>Stores data along with index: it can significantly accelerate query performance and reduces the I/O scans and CPU resources, where there are filters in the query. CarbonData index consists of multiple level of indices, a processing framework can leverage this index to reduce the task it needs to schedule and process, and it can also do skip scan in more finer grain unit (called blocklet) in task side scanning instead of scanning the whole file.</li>
+<li>Operable encoded data :Through supporting efficient compression and global encoding schemes, can query on compressed/encoded data, the data can be converted just before returning the results to the users, which is “late materialized”.</li>
+<li>Column group: Allow multiple columns to form a column group that would be stored as row format. This reduces the row reconstruction cost at query time.</li>
+<li>Supports for various use cases with one single Data format : like interactive OLAP-style query, Sequential Access (big scan), Random Access (narrow scan).</li>
+</ul>
+<h2><a id="Data_Types_68"></a>Data Types</h2>
+<p>The following types are supported :</p>
+<ul>
+<li>
+<p>Numeric Types</p>
+<ul>
+<li>SMALLINT</li>
+<li>INT/INTEGER</li>
+<li>BIGINT</li>
+<li>DOUBLE</li>
+<li>DECIMAL</li>
+</ul>
+</li>
+<li>
+<p>Date/Time Types</p>
+<ul>
+<li>TIMESTAMP</li>
+</ul>
+</li>
+<li>
+<p>String Types</p>
+<ul>
+<li>STRING</li>
+</ul>
+</li>
+<li>
+<p>Complex Types</p>
+<ul>
+<li>arrays: ARRAY&lt;data_type&gt;</li>
+<li>structs: STRUCT&lt;col_name : data_type [COMMENT col_comment], …&gt;</li>
+</ul>
+</li>
+</ul>
+<h2><a id="Compatibility_89"></a>Compatibility</h2>
+<h2><a id="Packaging_and_Interfaces_92"></a>Packaging and Interfaces</h2>
+<h3><a id="Packaging_94"></a>Packaging</h3>
+<p>Carbon provides following JAR packages:</p>
+<p><img src="https://cloud.githubusercontent.com/assets/6500698/14255195/831c6e90-fac5-11e5-87ab-3b16d84918fb.png" alt="carbon modules2"></p>
+<ul>
+<li>
+<p><strong>carbon-store.jar or carbondata-assembly.jar:</strong> This is the main Jar for carbon project, the target user of it are both user and developer.<br>
+- For MapReduce application users, this jar provides API to read and write carbon files through CarbonInput/OutputFormat in carbon-hadoop module.<br>
+- For developer, this jar can be used to integrate carbon with processing engine like spark and hive, by leveraging API in carbon-processing module.</p>
+</li>
+<li>
+<p><strong>carbon-spark.jar(Currently it is part of assembly jar):</strong> provides support for spark user, spark user can manipulate carbon data files by using native spark DataFrame/SQL interface. Apart from this, in order to leverage carbon’s builtin lifecycle management function, higher level concept like Managed Carbon Table, Database and corresponding DDL are introduced.</p>
+</li>
+<li>
+<p><strong>carbon-hive.jar(not yet provided):</strong> similar to carbon-spark, which provides integration to carbon and hive.</p>
+</li>
+</ul>
+<h3><a id="Interfaces_107"></a>Interfaces</h3>
+<h4><a id="API_109"></a>API</h4>
+<p>Carbon can be used in following scenarios:</p>
+<ol>
+<li>
+<p>For MapReduce application user<br>
+This User API is provided by carbon-hadoop. In this scenario, user can process carbon files in his MapReduce application by choosing CarbonInput/OutputFormat, and is responsible using it correctly.Currently only CarbonInputFormat is provided and OutputFormat will be provided soon.</p>
+</li>
+<li>
+<p>For Spark user<br>
+This User API is provided by the Spark itself. There are also two levels of APIs</p>
+<ul>
+<li>
+<p><strong>Carbon File</strong></p>
+<p>Similar to parquet, json, or other data source in Spark, carbon can be used with data source API. For example(please refer to DataFrameAPIExample for the more detail):</p>
+<pre><code>// User can create a DataFrame from any data source or transformation.
+val df = ...
+
+// Write data
+// User can write a DataFrame to a carbon file
+df.write
+.format(&quot;carbondata&quot;)
+.option(&quot;tableName&quot;, &quot;carbontable&quot;)
+.mode(SaveMode.Overwrite)
+.save()
+
+
+// read carbon data by data source API
+df = carbonContext.read
+.format(&quot;carbondata&quot;)
+.option(&quot;tableName&quot;, &quot;carbontable&quot;)
+.load(&quot;/path&quot;)
+
+// User can then use DataFrame for analysis
+df.count
+SVMWithSGD.train(df, numIterations)
+
+// User can also register the DataFrame with a table name, and use SQL for analysis
+df.registerTempTable(&quot;t1&quot;)  // register temporary table in SparkSQL catalog
+df.registerHiveTable(&quot;t2&quot;)  // Or, use a implicit funtion to register to Hive metastore
+sqlContext.sql(&quot;select count(*) from t1&quot;).show
+</code></pre>
+</li>
+<li>
+<p><strong>Managed Carbon Table</strong></p>
+<p>Carbon has in built support for high level concept like Table, Database, and supports full data lifecycle management, instead of dealing with just files, user can use carbon specific DDL to manipulate data in Table and Database level. Please refer <a href="https://github.com/HuaweiBigData/carbondata/wiki/Language-Manual:-DDL">DDL</a> and <a href="https://github.com/HuaweiBigData/carbondata/wiki/Language-Manual:-DML">DML</a></p>
+<pre><code>// Use SQL to manage table and query data
+create database db1;
+use database db1;
+show databases;
+create table tbl1 using org.apache.carbondata.spark;
+load data into table tlb1 path 'some_files';
+select count(*) from tbl1;
+</code></pre>
+</li>
+</ul>
+</li>
+<li>
+<p>For developer who want to integrate carbon into a processing engines like spark,hive or flink, use API provided by carbon-hadoop and carbon-processing:</p>
+<ul>
+<li>
+<p><strong>Query</strong> : integrate carbon-hadoop with engine specific API, like spark data source API</p>
+</li>
+<li>
+<p><strong>Data life cycle management</strong> : carbon provides utility functions in carbon-processing to manage data life cycle, like data loading, compact, retention, schema evolution. Developer can implement DDLs of their choice and leverage these utility function to do data life cycle management.</p>
+</li>
+</ul>
+</li>
+</ol>
+
+              </div>
+            </section>
+            <footer>
+    <div class="topcontant">
+      <div class="container-fluid">
+          <div class="col-md-4 col-sm-4">
+            <p class="footext">
+              Apache CarbonData, CarbonData, Apache, the Apache feather logo, and the Apache CarbonData project logo are trademarks of The Apache Software Foundation
+            </p>
+ 
+          </div>
+          <div class="col-md-8 col-sm-8">
+             <ul class="footer-nav">
+              <li><a href="">Site Map</a></li>
+              <li><a href="">Service</a></li>
+              <li><a href="">Contact us</a></li>
+             </ul>
+          </div>
+       </div>
+    </div>
+    <div class="bottomcontant">
+       <div class="container-fluid">
+          <div class="col-md-8 col-sm-8">
+            <p class="copyright-txt">Copyright © 2016. All rights reserved  &nbsp;&nbsp;|&nbsp;&nbsp;
+              <a href="#" class="term-links">Apache Software Foundation  </a>&nbsp;&nbsp;| &nbsp;&nbsp; <a href="#" class="term-links"> Privacy Policy </a>
+            </p>
+
+          </div>
+          <div class="col-md-4 col-sm-4">
+                 <div class="social-icon">
+                  <a href="#" class="icons"><i class="fa fa-facebook" aria-hidden="true"></i></a>
+                  <a href="#" class="icons"><i class="fa fa-twitter" aria-hidden="true"></i></a>
+                  <a href="#" class="icons"><i class="fa fa-linkedin" aria-hidden="true"></i></a>
+                 </div>
+          </div>
+    </div>
+     </div>
+
+  </footer>
+          </div>           
+
+        </div>
+      </div>
+    </div>
+   </section><!-- End systemblock part -->
+
+  <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
+
+    <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
+    <!-- Include all compiled plugins (below), or include individual files as needed -->
+    <script src="js/bootstrap.min.js"></script>
+  </body>
+  </html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/a1ad369d/content/documents/quickstart.html
----------------------------------------------------------------------
diff --git a/content/documents/quickstart.html b/content/documents/quickstart.html
new file mode 100644
index 0000000..2d2ad93
--- /dev/null
+++ b/content/documents/quickstart.html
@@ -0,0 +1,120 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style>
+p img{ max-width:100%}
+pre {
+    width: 80% !important;
+    white-space: normal;
+}
+</style>
+</head><body id="preview">
+<!--<p>&lt;!–<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+“License”); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>–&gt;</p>-->
+<p><img src="../docs/images/format/CarbonData_logo.png?raw=true" alt="CarbonData_Logo"></p>
+<h1><a id="Quick_Start_20"></a>Quick Start</h1>
+<p>This tutorial provides a quick introduction to using CarbonData.</p>
+<h2><a id="Getting_started_with_Apache_CarbonData_23"></a>Getting started with Apache CarbonData</h2>
+<ul>
+<li><a href="#installation">Installation</a></li>
+<li><a href="#InteractiveAnalysis-with-Carbon-Spark-Shell">Interactive Analysis with Carbon-Spark Shell</a>
+<ul>
+<li><a href="#basics">Basics</a></li>
+<li><a href="#executing-queries">Executing Queries</a>
+<ul>
+<li><a href="#prerequisites">Prerequisites</a></li>
+<li><a href="#create-table">Create Table</a></li>
+<li><a href="#load-data-to-table">Load data to Table</a></li>
+<li><a href="#query-data-from-table">Query data from table</a></li>
+</ul>
+</li>
+</ul>
+</li>
+<li><a href="#carbon-sql-cli">Carbon SQL CLI</a>
+<ul>
+<li><a href="#basics">Basics</a></li>
+<li><a href="#execute-queries-in-cli">Execute Queries in CLI</a></li>
+</ul>
+</li>
+<li><a href="">Building CarbonData</a></li>
+</ul>
+<h2><a id="Installation_39"></a>Installation</h2>
+<ul>
+<li>Download released package of <a href="http://spark.apache.org/downloads.html">Spark 1.5.0 to 1.6.2</a></li>
+<li>Download and install <a href="http://thrift-tutorial.readthedocs.io/en/latest/installation.html">Apache Thrift 0.9.3</a>, make sure thrift is added to system path.</li>
+<li>Download <a href="https://github.com/apache/incubator-carbondata">Apache CarbonData code</a> and build it. Please visit <a href="Installing-CarbonData-And-IDE-Configuartion.md">Building CarbonData And IDE Configuration</a> for more information.</li>
+</ul>
+<h2><a id="Interactive_Analysis_with_CarbonSpark_Shell_44"></a>Interactive Analysis with Carbon-Spark Shell</h2>
+<p>Carbon Spark shell is a wrapper around Apache Spark Shell, it provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more details on Spark shell.</p>
+<h4><a id="Basics_47"></a>Basics</h4>
+<p>Start Spark shell by running the following in the Carbon directory:</p>
+<pre><code>./bin/carbon-spark-shell
+</code></pre>
+<p><em>Note</em>: In this shell SparkContext is readily available as sc and CarbonContext is available as cc.</p>
+<p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br>
+By default carbon.storelocation is set as :</p>
+<pre><code>hdfs://IP:PORT/Opt/CarbonStore
+</code></pre>
+<p>And you can provide your own store location by providing configuration using --conf option like:</p>
+<pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=&lt;storelocation&gt;
+</code></pre>
+<h4><a id="Executing_Queries_64"></a>Executing Queries</h4>
+<p><strong>Prerequisites</strong></p>
+<p>Create sample.csv file in CarbonData directory. The CSV is required for loading data into Carbon.</p>
+<pre><code>$ cd carbondata
+$ cat &gt; sample.csv &lt;&lt; EOF
+  id,name,city,age
+  1,david,shenzhen,31
+  2,eason,shenzhen,27
+  3,jarry,wuhan,35
+  EOF
+</code></pre>
+<p><strong>Create table</strong></p>
+<pre><code>scala&gt;cc.sql(&quot;create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'&quot;)
+</code></pre>
+<p><strong>Load data to table</strong></p>
+<pre><code>scala&gt;val dataFilePath = new File(&quot;../carbondata/sample.csv&quot;).getCanonicalPath
+scala&gt;cc.sql(s&quot;load data inpath '$dataFilePath' into table test_table&quot;)
+</code></pre>
+<p><strong>Query data from table</strong></p>
+<pre><code>scala&gt;cc.sql(&quot;select * from test_table&quot;).show
+scala&gt;cc.sql(&quot;select city, avg(age), sum(age) from test_table group by city&quot;).show
+</code></pre>
+<h2><a id="Carbon_SQL_CLI_98"></a>Carbon SQL CLI</h2>
+<p>The Carbon Spark SQL CLI is a wrapper around Apache Spark SQL CLI. It is a convenient tool to execute queries input from the command line. Please visit <a href="http://spark.apache.org/docs/latest/">Apache Spark Documentation</a> for more information Spark SQL CLI.</p>
+<h4><a id="Basics_101"></a>Basics</h4>
+<p>Start the Carbon Spark SQL CLI, run the following in the Carbon directory :</p>
+<pre><code>./bin/carbon-spark-sql
+</code></pre>
+<p>CarbonData stores and writes the data in its specified format at the default location on the hdfs.<br>
+By default carbon.storelocation is set as :</p>
+<pre><code>hdfs://IP:PORT/Opt/CarbonStore
+</code></pre>
+<p>And you can provide your own store location by providing configuration using --conf option like:</p>
+<pre><code>./bin/carbon-spark-sql --conf spark.carbon.storepath=/home/root/carbonstore
+</code></pre>
+<h4><a id="Execute_Queries_in_CLI_118"></a>Execute Queries in CLI</h4>
+<pre><code>spark-sql&gt; create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'
+spark-sql&gt; load data inpath '../sample.csv' into table test_table
+spark-sql&gt; select city, avg(age), sum(age) from test_table group by city
+</code></pre>
+<h2><a id="Building_CarbonData_124"></a>Building CarbonData</h2>
+<p>To get started, get CarbonData from the <a href="">downloads</a> on the <a href="http://carbondata.incubator.apache.org.">http://carbondata.incubator.apache.org.</a><br>
+CarbonData uses Hadoop’s client libraries for HDFS and YARN and Spark’s libraries. Downloads are pre-packaged for a handful of popular Spark versions.</p>
+<p>If you’d like to build CarbonData from source,  Please visit <a href="">Building CarbonData And IDE Configuration</a></p>
+
+</body></html>

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/a1ad369d/content/documents/troubleshooting.html
----------------------------------------------------------------------
diff --git a/content/documents/troubleshooting.html b/content/documents/troubleshooting.html
new file mode 100644
index 0000000..8d71dfa
--- /dev/null
+++ b/content/documents/troubleshooting.html
@@ -0,0 +1,42 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style></style></head><body id="preview">
+<p>&lt;!–<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+“License”); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>–&gt;</p>
+<h1><a id="Troubleshooting_19"></a>Troubleshooting</h1>
+<p>This tutorial is designed to provide troubleshooting for end users and developers<br>
+who are building, deploying, and using CarbonData.</p>
+<ul>
+<li>
+<h2><a id="Prerequisites_for_Developers_23"></a>Prerequisites for Developers</h2>
+</li>
+<li>
+<h2><a id="Prerequisites_for_End_Users_25"></a>Prerequisites for End Users</h2>
+</li>
+<li>
+<h2><a id="General_Prevention_and_Best_Practices_27"></a>General Prevention and Best Practices</h2>
+</li>
+<li>
+<h2><a id="Procedures_29"></a>Procedure(s)</h2>
+</li>
+<li>
+<h2><a id="References_31"></a>References</h2>
+</li>
+</ul>
+
+</body></html>

http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/a1ad369d/content/documents/usecases.html
----------------------------------------------------------------------
diff --git a/content/documents/usecases.html b/content/documents/usecases.html
new file mode 100644
index 0000000..d4bac49
--- /dev/null
+++ b/content/documents/usecases.html
@@ -0,0 +1,93 @@
+
+<!DOCTYPE html><html><head><meta charset="utf-8"><title>Untitled Document.md</title><style></style></head><body id="preview">
+<p>&lt;!–<br>
+Licensed to the Apache Software Foundation (ASF) under one<br>
+or more contributor license agreements.  See the NOTICE file<br>
+distributed with this work for additional information<br>
+regarding copyright ownership.  The ASF licenses this file<br>
+to you under the Apache License, Version 2.0 (the<br>
+“License”); you may not use this file except in compliance<br>
+with the License.  You may obtain a copy of the License at</p>
+<pre><code>  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+&quot;AS IS&quot; BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+</code></pre>
+<p>–&gt;</p>
+<h1><a id="CarbonData_Use_Cases_19"></a>CarbonData Use Cases</h1>
+<p>This tutorial will discuss about the problems that CarbonData <a href="http://address.It">address.It</a> shall take you through the identified top use cases of Carbon.</p>
+<h2><a id="Introduction_22"></a>Introduction</h2>
+<p>For big data interactive analysis scenarios, many customers expect sub-second response to query TB-PB level data on general hardware clusters with just a few nodes.</p>
+<p>In the current big data ecosystem, there are few columnar storage formats such as ORC and Parquet that are designed for SQL on Big Data. Apache Hive’s ORC format is<br>
+a columnar storage format with basic indexing capability. However, ORC cannot meet the sub-second query response expectation on TB level data, because ORC format<br>
+performs only stride level dictionary encoding and all analytical operations such as filtering and aggregation is done on the actual data. Apache Parquet is columnar<br>
+storage can improve performance in comparison to ORC, because of more efficient storage organization. Though Parquet can provide query response on TB level data in a<br>
+few seconds, it is still far from the sub-second expectation of interactive analysis users. Cloudera Kudu can effectively solve some query performance issues, but kudu<br>
+is not hadoop native, can’t seamlessly integrate historic HDFS data into new kudu system.</p>
+<p>However, CarbonData uses specially engineered optimizations targeted to improve performance of analytical queries which can include filters, aggregation and distinct counts,<br>
+the required data to be stored in an indexed, well organized, read-optimized format, CarbonData’s query performance can achieve sub-second response.</p>
+<h2><a id="Motivation_Single_Format_to_provide_low_latency_response_for_all_use_cases_35"></a>Motivation: Single Format to provide low latency response for all use cases</h2>
+<p>The main motivation behind CarbonData is to provide a single storage format for all the usecases of querying big data on Hadoop. Thus CarbonData is able to cover all use-cases<br>
+into a single storage format.</p>
+<p><img src="../docs/images/format/carbon_data_motivation.png?raw=true" alt="Motivation"></p>
+<h2><a id="Use_Cases_41"></a>Use Cases</h2>
+<ul>
+<li>
+<h3><a id="Sequential_Access_42"></a>Sequential Access</h3>
+<ul>
+<li>Supports queries that select only a few columns with a group by clause but do not contain any filters.<br>
+This results in full scan over the complete store for the selected columns.</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_full_scan.png?raw=true" alt="Sequential_Scan"></p>
+<p><strong>Scenario</strong></p>
+<ul>
+<li>ETL jobs</li>
+<li>Log Analysis</li>
+</ul>
+</li>
+<li>
+<h3><a id="Random_Access_53"></a>Random Access</h3>
+<ul>
+<li>Supports Point Query. These are queries used from operational applications and usually select all or most of the columns but do involve a large number of<br>
+filters which reduce the result to a small size. Such queries generally do not involve any aggregation or group by clause.
+<ul>
+<li>Row-key query(like HBase)</li>
+<li>Narrow Scan</li>
+<li>Requires second/sub-second level low latency</li>
+</ul>
+</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_random_scan.png?raw=true" alt="random_access"></p>
+<p><strong>Scenario</strong></p>
+<ul>
+<li>Operational Query</li>
+<li>User Profiling</li>
+</ul>
+</li>
+<li>
+<h3><a id="Olap_Style_Query_67"></a>Olap Style Query</h3>
+<ul>
+<li>Supports Interactive data analysis for any dimensions. These are queries which are typically fired from Interactive Analysis tools.<br>
+Such queries often select a few columns but involve filters and group by on a column or a grouping expression.<br>
+It also supports queries that :
+<ul>
+<li>involves aggregation/join</li>
+<li>Roll-up,Drill-down,Slicing and Dicing</li>
+<li>Low-latency ad-hoc query</li>
+</ul>
+</li>
+</ul>
+<p><img src="../docs/images/format/carbon_data_olap_scan.png?raw=true" alt="Olap_style_query"></p>
+<p><strong>Scenario</strong></p>
+<ul>
+<li>Dash-board reporting</li>
+<li>Fraud &amp; Ad-hoc Analysis</li>
+</ul>
+</li>
+</ul>
+
+</body></html>


Mime
View raw message