carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [19/35] incubator-carbondata-site git commit: Updated website for CarbonData release 1.0.0
Date Sat, 04 Feb 2017 02:38:18 GMT
diff --git a/content/docs/latest/mainpage.html b/content/docs/latest/mainpage.html
new file mode 100644
index 0000000..b85e1b2
--- /dev/null
+++ b/content/docs/latest/mainpage.html
@@ -0,0 +1,144 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <link href='../../images/favicon.ico' rel='shortcut icon' type='image/x-icon'>
+    <!-- The above 3 meta tags *must* come first in the head; any other head content must
come *after* these tags -->
+    <title>CarbonData</title>
+    <!-- Bootstrap -->
+    <link rel="stylesheet" href="../../css/bootstrap.min.css">
+    <link href="../../css/style.css" rel="stylesheet">
+    <link href="../../css/print.css" rel="stylesheet" >
+    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries
+    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
+    <!--[if lt IE 9]>
+      <script src=""></script>
+      <script src="https://oss.maxcdn.scom/respond/1.4.2/respond.min.js"></script>
+    <![endif]-->
+    <script src="../../js/jquery.min.js"></script>
+    <script src="../../js/bootstrap.min.js"></script>
+  </head>
+  <body>
+    <header>
+     <nav class="navbar navbar-default navbar-custom cd-navbar-wrapper" >
+      <div class="container">
+        <div class="navbar-header">
+          <button aria-controls="navbar" aria-expanded="false" data-target="#navbar" data-toggle="collapse"
class="navbar-toggle collapsed" type="button">
+            <span class="sr-only">Toggle navigation</span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+            <span class="icon-bar"></span>
+          </button>
+          <a href="../../index.html" class="logo">
+             <img src="../../images/CarbonDataLogo.png" alt="CarbonData logo" title="CarbocnData
logo"  />
+          </a>
+        </div>
+        <div class="navbar-collapse collapse cd_navcontnt" id="navbar">
+         <ul class="nav navbar-nav navbar-right navlist-custom">
+              <li><a href="../../index.html" class="hidden-xs"><i class="fa
fa-home" aria-hidden="true"></i> </a></li>
+              <li><a href="../../index.html" class="hidden-lg hidden-md hidden-sm">Home</a></li>
+              <li class="dropdown">
+                  <a href="#" class="dropdown-toggle " data-toggle="dropdown" role="button"
aria-haspopup="true" aria-expanded="false"> Download <span class="caret"></span></a>
+                  <ul class="dropdown-menu">
+                      <li>
+                          <a href=""
+                             target="_blank">Apache CarbonData 1.0.0</a></li>
+                      <li>
+                          <a href=""
+                             target="_blank">Apache CarbonData 0.2.0</a></li>
+                      <li>
+                          <a href=""
+                             target="_blank">Apache CarbonData 0.1.1</a></li>
+                      <li>
+                          <a href=""
+                             target="_blank">Apache CarbonData 0.1.0</a></li>
+                      <li>
+                          <a href=""
+                             target="_blank">Release Archive</a></li>
+                  </ul>
+                </li>
+              <li><a href="mainpage.html?page=userguide" class="">Documentation</a></li>
+              <li class="dropdown">
+                  <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button"
aria-haspopup="true" aria-expanded="false">Community <span class="caret"></span></a>
+                  <ul class="dropdown-menu">
+                      <li><a href=""
target="_blank">Contributing to CarbonData</a></li>
+                      <li><a href=""
target="_blank">Project Committers</a></li>
+                    <li><a href="../../meetup.html">CarbonData Meetups </a></li>
+                  </ul>
+                </li>
+                <li class="dropdown">
+                  <a href="" class="apache_link hidden-xs dropdown-toggle"
data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
+                   <ul class="dropdown-menu">
+                      <li><a href=""  target="_blank">Apache
+                      <li><a href=""  target="_blank">License</a></li>
+                      <li><a href=""
+                      <li><a href=""
+                    </ul>
+                </li>
+                <li class="dropdown">
+                  <a href="" class="hidden-lg hidden-md hidden-sm
dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache</a>
+                   <ul class="dropdown-menu">
+                      <li><a href=""  target="_blank">Apache
+                      <li><a href=""  target="_blank">License</a></li>
+                      <li><a href=""
+                      <li><a href=""
+                    </ul>
+                </li>
+           </ul>
+        </div><!--/.nav-collapse -->
+      </div>
+    </nav>
+     </header> <!-- end Header part -->
+   <div class="fixed-padding"></div> <!--  top padding with fixde header 
+   <section><!-- Dashboard nav -->
+    <div class="container-fluid q">
+        <div class="col-sm-12  col-md-12 maindashboard">
+              <div class="row">
+                <section>
+                  <div style="padding:10px 15px;">
+                    <div class="doc-header">
+                        <div class="doc-toc">
+                            <a href="mainpage.html?page=userguide" class="icon toc-icon"></a>
+                        </div>
+                       <img src="../../images/format/CarbonData_icon.png" alt="" class="logo-print"
+                       <span>Version: 1.0.0 | Published: 30-01-2017</span>
+                       <i class="fa fa-print print-icon" aria-hidden="true" onclick="divPrint();"></i>
+                    </div>
+                    <div id="viewpage" name="viewpage">   </div>
+                    <div class="doc-footer">
+                         <a href="#top" class="scroll-top">Top</a>
+                    </div>
+                  </div>
+                </section>
+              </div>
+        </div>
+      </div>
+    </section><!-- End systemblock part -->
+  <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
+    <script src="../../js/custom.js"></script>
+    <script src="../../js/mdNavigation.js" type="text/javascript"></script>
+    <script type="text/javascript">
+     <!-- $("#leftmenu").load("table-of-content.html");-->
+    </script>
+  </body>
+  </html>
\ No newline at end of file
diff --git a/content/docs/latest/overview-of-carbondata.html b/content/docs/latest/overview-of-carbondata.html
new file mode 100644
index 0000000..5f4aff3
--- /dev/null
+++ b/content/docs/latest/overview-of-carbondata.html
@@ -0,0 +1,51 @@
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+<h1>Overview</h1><p>This tutorial provides a detailed overview about :</p>
+  <li><a href="#introduction">Introduction</a></li>
+  <li><a href="#features">Features</a></li>
+<div id="introduction"></div>
+<h2>Introduction</h2><p>CarbonData is a fully indexed columnar and Hadoop
native data-store for processing heavy analytical workloads and detailed queries on big data.
CarbonData allows faster interactive query using advanced columnar storage, index, compression
and encoding techniques to improve computing efficiency, which helps in speeding up queries
by an order of magnitude faster over PetaBytes of data.</p><p>In customer benchmarks,
CarbonData has proven to manage Petabyte of data running on extraordinarily low-cost hardware
and answers queries around 10 times faster than the current open source solutions (column-oriented
SQL on Hadoop data-stores).</p><p>Some of the salient features of CarbonData are
+  <li>Low-Latency for various types of data access patterns like Sequential, Random
and OLAP.</li>
+  <li>Fast query on fast data.</li>
+  <li>Space efficiency.</li>
+  <li>General format available on Hadoop-ecosystem.</li>
+<div id="features"></div>
+<h2>Features</h2><p>CarbonData file format is a columnar store in HDFS.
It has many features that a modern columnar format has, such as splittable, compression schema,
complex data type etc and CarbonData has following unique features:</p>
+  <li><p>Unique Data Organization: Though CarbonData stores data in Columnar
format, it differs from traditional Columnar formats as the columns in each row-group(Data
Block) is sorted independent of the other columns. Though this arrangement requires CarbonData
to store the row-number mapping against each column value, it makes it possible to use binary
search for faster filtering and since the values are sorted, same/similar values come together
which yields better compression and offsets the storage overhead required by the row number
+  <li><p>Advanced Push Down Optimizations: CarbonData pushes as much of query
processing as possible close to the data to minimize the amount of data being read, processed,
converted and transmitted/shuffled. Using projections and filters it reads only the required
columns form the store and also reads only the rows that match the filter conditions provided
in the query.</p></li>
+  <li><p>Multi Level Indexing: CarbonData uses multiple indices at various levels
to enable faster search and speed up query processing.</p></li>
+  <li><p>Dictionary Encoding: Most databases and big data SQL data stores employ
columnar encoding to achieve data compression by storing small integers numbers (surrogate
value) instead of full string values. However, almost all existing databases and data stores
divide the data into row groups containing anywhere from few thousand to a million rows and
employ dictionary encoding only within each row group. Hence, the same column value can have
different surrogate values in different row groups. So, while reading the data, conversion
from surrogate value to actual value needs to be done immediately after the data is read from
the disk. But CarbonData employs global surrogate key which means that a common dictionary
is maintained for the full store on one machine/node. So CarbonData can perform all the query
processing work such as grouping/aggregation, sorting etc on light weight surrogate values.
The conversion from surrogate to actual values needs to be done only on the final res
 ult. This procedure improves performance on two aspects. Conversion from surrogate values
to actual values is done only for the final result rows which are much less than the actual
rows read from the store. All query processing and computation such as grouping/aggregation,
sorting, and so on is done on lightweight surrogate values which requires less memory and
CPU time compared to actual values.</p></li>
+  <li><p>Deep Spark Integration: It has built-in spark integration for Spark
1.6.2, 2.1 and interfaces for Spark SQL, DataFrame API and query optimization. It supports
bulk data ingestion and allows saving of spark dataframes as CarbonData files.</p></li>
+  <li><p>Update Delete Support: It supports batch updates like daily update scenarios
for OLAP and Base+Delta file based design.</p></li>
+  <li><p>Bucketing : It is a technique that is used for uniform distribution
of data across files in CarbonData. It enhances the performance of join queries. While loading
the data, records are placed into buckets based on hashing algorithm. During the execution
of join queries the records can be fetched from buckets with out need of shuffling.This feature
is used to distribute/organize the table/partition data into multiple files placing similar
records in same file.</p></li>
+  <li><p>Global Multi Dimensional Keys(MDK) based B+Tree Index for all non- measure
columns: Aids in quickly locating the row groups(Data Blocks) that contain the data matching
search/filter criteria.</p></li>
+  <li><p>Min-Max Index for all columns: Aids in quickly locating the row groups(Data
Blocks) that contain the data matching search/filter criteria.</p></li>
+  <li><p>Data Block level Inverted Index for all columns: Aids in quickly locating
the rows that contain the data matching search/filter criteria within a row group(Data Blocks).</p></li>
+  <li><p>Store data along with index: Significantly accelerates query performance
and reduces the I/O scans and CPU resources, when there are filters in the query. CarbonData
index consists of multiple levels of indices. A processing framework can leverage this index
to reduce the task it needs to schedule and process. It can also do skip scan in more finer
grain units (called blocklet) in task side scanning instead of scanning the whole file.</p></li>
+  <li><p>Operable encoded data: It supports efficient compression and global
encoding schemes and can query on compressed/encoded data. The data can be converted just
before returning the results to the users, which is "late materialized".</p></li>
+  <li><p>Column group: Allows multiple columns to form a column group that would
be stored as row format. This reduces the row reconstruction cost at query time.</p></li>
+  <li><p>Support for various use cases with one single Data format: Examples
are interactive OLAP-style query, Sequential Access (big scan) and Random Access (narrow scan).</p></li>

View raw message