http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/0d4cdb1c/content/docs/latest/use_cases.html ---------------------------------------------------------------------- diff --git a/content/docs/latest/use_cases.html b/content/docs/latest/use_cases.html deleted file mode 100644 index b6f95cf..0000000 --- a/content/docs/latest/use_cases.html +++ /dev/null @@ -1,157 +0,0 @@ -Untitled Document.md - -

Version: 0.2.0 | Last Published: 21-11-2016

- - Top - - - -

CarbonData Use Cases

-

This tutorial discusses about the problems that CarbonData addresses. It shall take you through the identified top use cases of CarbonData.

-

Introduction

-

For big data interactive analysis scenarios, many customers expect sub-second response to query TB-PB level data on general hardware clusters with just a few nodes.

-

In the current big data ecosystem, there are few columnar storage formats such as ORC and Parquet that are designed for SQL on Big Data. Apache Hive’s ORC format is -a columnar storage format with basic indexing capability. However, ORC cannot meet the sub-second query response expectation on TB level data, because ORC format -performs only stride level dictionary encoding and all analytical operations such as filtering and aggregation is done on the actual data. Apache Parquet is columnar -storage format that can improve performance in comparison to ORC because of its more efficient storage organization. Though Parquet can provide query response on TB level data in a -few seconds, it is still far from the sub-second expectation of interactive analysis users. Cloudera Kudu can effectively solve some query performance issues, but kudu -is not hadoop native, can’t seamlessly integrate historic HDFS data into new kudu system.

-

However, CarbonData uses specially engineered optimizations targeted to improve performance of analytical queries which can include filters, aggregation and distinct counts, -the required data needs to be stored in an indexed, well organized, read-optimized format, CarbonData’s query performance can achieve sub-second response.

-

Motivation: Single Format to provide Low Latency Response for all Use Cases

-

The main motivation behind CarbonData is to provide a single storage format for all the usecases of querying big data on Hadoop. Thus CarbonData is able to cover all use-cases -into a single storage format.

-

Motivation

- -
- Top -
- -