http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/65061b9a/src/main/webapp/docs/latest/overview.html ---------------------------------------------------------------------- diff --git a/src/main/webapp/docs/latest/overview.html b/src/main/webapp/docs/latest/overview.html index bf19949..2828265 100644 --- a/src/main/webapp/docs/latest/overview.html +++ b/src/main/webapp/docs/latest/overview.html @@ -1,103 +1,115 @@ -Untitled Document.md - - - -

Overview

+Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +

Overview

This tutorial provides a detailed overview about :

Introduction

-

CarbonData is a fully indexed columnar and Hadoop native data-store for processing heavy analytical workloads and detailed queries on big data. CarbonData allows faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, in turn it will help speedup queries an order of magnitude faster over PetaBytes of data.

+

CarbonData is a fully indexed columnar and Hadoop native data-store for processing heavy analytical workloads and detailed queries on big data. CarbonData allows faster interactive query using advanced columnar storage, index, compression and encoding techniques to improve computing efficiency, which helps in speeding up queries by an order of magnitude faster over PetaBytes of data.

In customer benchmarks, CarbonData has proven to manage Petabyte of data running on extraordinarily low-cost hardware and answers queries around 10 times faster than the current open source solutions (column-oriented SQL on Hadoop data-stores).

-

Some of the Salient features of CarbonData are :

+

Some of the salient features of CarbonData are :

CarbonData File Structure

-

CarbonData file contains groups of data called blocklet, along with all required information like schema, offsets and indices, etc, in a file footer, co-located in HDFS.

-

The file footer can be read once to build the indices in memory, which can be utilized for optimizing the scans and processing for all subsequent queries.

-

Each blocklet in the file is further divided into chunks of data called Data Chunks. Each data chunk is organized either in columnar format or row format, and stores the data of either a single column or a set of columns. All blocklets in one file contain the same number and type of Data Chunks.

+

CarbonData files contain groups of data called blocklets, along with all required information like schema, offsets and indices etc, in a file footer, co-located in HDFS. The file footer can be read once to build the indices in memory, which can be utilized for optimizing the scans and processing for all subsequent queries.

+

Each blocklet in the file is further divided into chunks of data called data chunks. Each data chunk is organized either in columnar format or row format, and stores the data of either a single column or a set of columns. All blocklets in a file contain the same number and type of data chunks.

- Carbon File Structure + CarbonData File Structure
-

Each Data Chunk contains multiple groups of data called as Pages. There are three types of pages.

+

Each data chunk contains multiple groups of data called as pages. There are three types of pages.

-Carbon File Format +CarbonData File Format

Features

-

CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema ,complex data type etc, and CarbonData has following unique features:

+

CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema, complex data type etc and CarbonData has following unique features:

Data Types

-

The following types are supported :

+

CarbonData supports the following data types:

-
-

Compatibility

-

Packaging and Interfaces

-

Packaging

-

Carbon provides following JAR packages:

-
- carbon modules2 -
+ + +
+

Interfaces

-
-

Interfaces

API

-

Carbon can be used in following scenarios:

+

CarbonData can be used in following scenarios:

  1. For MapReduce application user
    -This User API is provided by carbon-hadoop. In this scenario, user can process carb