http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/0bb92d4d/src/main/webapp/docs/latest/use_cases.html ---------------------------------------------------------------------- diff --git a/src/main/webapp/docs/latest/use_cases.html b/src/main/webapp/docs/latest/use_cases.html new file mode 100644 index 0000000..03889e1 --- /dev/null +++ b/src/main/webapp/docs/latest/use_cases.html @@ -0,0 +1,121 @@ +Untitled Document.md + +

CarbonData Use Cases

+

This tutorial will discuss about the problems that CarbonData address.It shall take you through the identified top use cases of Carbon.

+

Introduction

+

For big data interactive analysis scenarios, many customers expect sub-second response to query TB-PB level data on general hardware clusters with just a few nodes.

+

In the current big data ecosystem, there are few columnar storage formats such as ORC and Parquet that are designed for SQL on Big Data. Apache Hive’s ORC format is
+a columnar storage format with basic indexing capability. However, ORC cannot meet the sub-second query response expectation on TB level data, because ORC format
+performs only stride level dictionary encoding and all analytical operations such as filtering and aggregation is done on the actual data. Apache Parquet is columnar
+storage can improve performance in comparison to ORC, because of more efficient storage organization. Though Parquet can provide query response on TB level data in a
+few seconds, it is still far from the sub-second expectation of interactive analysis users. Cloudera Kudu can effectively solve some query performance issues, but kudu
+is not hadoop native, can’t seamlessly integrate historic HDFS data into new kudu system.

+

However, CarbonData uses specially engineered optimizations targeted to improve performance of analytical queries which can include filters, aggregation and distinct counts,
+the required data to be stored in an indexed, well organized, read-optimized format, CarbonData’s query performance can achieve sub-second response.

+

Motivation: Single Format to provide low latency response for all use cases

+

The main motivation behind CarbonData is to provide a single storage format for all the usecases of querying big data on Hadoop. Thus CarbonData is able to cover all use-cases
+into a single storage format.

+

Motivation

+

Use Cases

+ +
+ Top +
+ +