carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [40/50] [abbrv] incubator-carbondata git commit: Update
Date Thu, 30 Jun 2016 17:42:27 GMT


Branch: refs/heads/master
Commit: 6e3e4c35ab8e2c338c6f3de4c1338083cd9aaa99
Parents: 167d527
Author: Liang Chen <>
Authored: Thu Jun 30 06:52:25 2016 +0530
Committer: GitHub <>
Committed: Thu Jun 30 06:52:25 2016 +0530

---------------------------------------------------------------------- | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/ b/
index a2cd048..0f7e1d7 100644
--- a/
+++ b/
@@ -1,17 +1,10 @@
+# This github has migrated to apache:, please
fork new github.
 # CarbonData
 CarbonData is a new Apache Hadoop native file format for faster 
 interactive query using advanced columnar storage, index, compression 
 and encoding techniques to improve computing efficiency, in turn it will 
 help speedup queries an order of magnitude faster over PetaBytes of data. 
-### Why CarbonData
-Based on the below requirements, we investigated existing file formats in the Hadoop eco-system,
but we could not find a suitable solution that can satisfy all the requirements at the same
time,so we start designing CarbonData. 
-* Requirement1:Support big scan & only fetch a few columns 
-* Requirement2:Support primary key lookup response in sub-second. 
-* Requirement3:Support interactive OLAP-style query over big data which involve many filters
in a query, this type of workload should response in seconds. 
-* Requirement4:Support fast individual record extraction which fetch all columns of the record.

-* Requirement5:Support HDFS so that customer can leverage existing Hadoop cluster. 
 ### Features
 CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar
format has, such as splittable, compression schema ,complex data type etc. And CarbonData
has following unique features:
 * Stores data along with index: it can significantly accelerate query performance and reduces
the I/O scans and CPU resources, where there are filters in the query.  CarbonData index consists
of multiple level of indices, a processing framework can leverage this index to reduce the
task it needs to schedule and process, and it can also do skip scan in more finer grain unit
(called blocklet) in task side scanning instead of scanning the whole file. 

View raw message