http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/01665608/src/main/webapp/docs/latest/configuring.html ---------------------------------------------------------------------- diff --git a/src/main/webapp/docs/latest/configuring.html b/src/main/webapp/docs/latest/configuring.html index 51c0e6c..338d016 100644 --- a/src/main/webapp/docs/latest/configuring.html +++ b/src/main/webapp/docs/latest/configuring.html @@ -1,5 +1,30 @@ -Untitled Document.md

Configuring CarbonData

-

This tutorial guides you through the advanced configurations of CarbonData :

+

This tutorial will guide you through the advance configurations of CarbonData :

System Configuration

-

This section provides the details of all the configurations required for the CarbonData System.
+

This section provides the details of all the configurations required for Carbon System.

System Configuration in carbon.properties

- + @@ -46,12 +71,12 @@ under the License. - + - + @@ -61,23 +86,24 @@ under the License. - + - +
PropertyParameter Default Value Description
carbon.storelocation /user/hive/warehouse/carbon.storeLocation where CarbonData will create the store, and write the data in its own format. NOTE: Store location should be in HDFS.Location where Carbon will create the store, and write the data in its own format.NOTE: Store location should be in HDFS.
carbon.ddl.base.hdfs.url hdfs://hacluster/opt/dataThis property is used to configure the HDFS relative path from the HDFS base path, configured in fs.defaultFS. The path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv, the path “hdfs://10.18.101.155:54310” will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user can specify the csv path as /2016/xyz.csv.This property is used to configure the HDFS relative path from the HDFS base path, configured in fs.defaultFS. The path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload.For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv,the path “hdfs://10.18.101.155:54310” will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url.Now while dataload user can specify the csv path as/2016/xyz.csv.
carbon.badRecords.location
carbon.kettle.home $SPARK_HOME/carbonlib/carbonpluginsPath used by CarbonData internally to create graph for loading the data.Path used by Carbon internally to create graph for loading the data.
carbon.data.file.version 2If this parameter value is set to 1, then CarbonData will support the data load which is in old format. If the value is set to 2, then CarbonData will support the data load of new format only. NOTE: The file format created before DataSight Spark V100R002C30 is considered as old format.If this parameter value is set to1, then the Carbon supports the data load which is in old format. If the value is set to 2, then the Carbon supports the data load of new format only.NOTE: The file format created before DataSight Spark V100R002C30 is considered as old format.

Performance Configuration

-

This section provides the details of all the configurations required for CarbonData Performance Optimization.
+

This section provides the details of all the configurations required for Carbon Performance Optimization.

Performance Configuration in carbon.properties

    -
  1. Data Loading Configuration +
  2. Data Loading Configuration
  3. +
@@ -91,8 +117,8 @@ under the License. - - + + @@ -103,13 +129,13 @@ under the License. - + - + @@ -121,20 +147,20 @@ under the License. - + - - + + - - + + @@ -150,10 +176,9 @@ under the License.
carbon.sort.file.buffer.size 20File read buffer size used during sorting. This value is expressed in MB.Min=1 and Max=100File read buffer size used during sorting.The value is in MB.Min=1 and Max=100
carbon.graph.rowset.size
carbon.number.of.cores.while.loading 6Number of cores to be used while loading data.Number of cores to be used while data loading.
carbon.sort.size 500000Record count to sort and write intermediate files to temp.Record count to sort and write to temp intermediate files.
carbon.number.of.cores.block.sort 7Number of cores to use for block sort while loading data.Number of cores to be used for block sort while dataloading.
carbon.max.driver.lru.cache.size -1Max LRU cache size upto which data will be loaded at the driver side. This value is expressed in MB. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted.Max LRU cache size upto which data will be loaded at the driver side.The value is in MB. The default value is -1, means there is no memory limit for caching. Only integer values greater than 0 are accepted.
carbon.max.executor.lru.cache.size -1Max LRU cache size upto which data will be loaded at the executor side. This value is expressed in MB. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. If this parameter is not configured, then the carbon.max.driver.lru.cache.size value will be considered. Max LRU cache size upto which data will be loaded at the executor side.The value is in MB. The default value is -1, means there is no memory limit for caching. Only integer values greater than 0 are accepted. If this parameter is not configured, then thecarbon.max.driver.lru.cache.size value will be considered.
carbon.merge.sort.prefetch
- - -
  • Compaction Configuration - +
      +
    1. Compaction Configuration
    2. +
    @@ -167,43 +192,44 @@ under the License. - + - - + + - + - + - + - +
    carbon.number.of.cores.while.compacting 2Number of cores which are used to write data during compaction.Number of cores which is used to write data during compaction.
    carbon.compaction.level.threshold4, 3This property is for minor compaction which decides how many segments to be merged. Example: If it is set as 2, 3 then minor compaction will be triggered for every 2 segments. 3 is the number of level 1 compacted segment which is further compacted to new segment.4,3This property is for minor compaction which decides how many segments to be merged.Example: if it is set as 2,3 then minor compaction will be triggered for every 2 segments. 3 is the number of level 1 compacted segment which is further compacted to new segment. Valid values are from 0-100.
    carbon.major.compaction.size 1024Major compaction size can be configured using this parameter. Sum of the segments which is below this threshold will be merged. This value is expressed in MB.Major compaction size can be configured using this parameter. Sum of the segments which is below this threshold will be merged. The value is in MB.
    carbon.horizontal.compaction.enable trueThis property is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files becomes more than specified threshold.This property is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files becomes more than specified threshold. By default the horizontal compaction is Turned ON but can turn OFF the horizontal compaction by setting the value to false.
    carbon.horizontal.UPDATE.compaction.threshold 1 This property specifies the threshold limit on number of UPDATE delta files within a segment. In case the number of delta files goes beyond the threshold, the UPDATE delta files within the segment becomes eligible for horizontal compaction and compacted into single UPDATE delta file.Values between 1 to 10000.By default the value is set to 1 and can be altered to values between 1 to 10000.
    carbon.horizontal.DELETE.compaction.threshold 1 This property specifies the threshold limit on number of DELETE delta files within a block of a segment. In case the number of delta files goes beyond the threshold, the DELETE delta files for the particular block of the segment becomes eligible for horizontal compaction and compacted into single DELETE delta file.Values between 1 to 10000.By default the value is set to 1 and can be altered to values between 1 to 10000.
    -
  • -
  • Query Configuration +
      +
    1. Query Configuration
    2. +
    @@ -240,8 +266,6 @@ under the License.
    -
  • -

    Miscellaneous Configuration

    @@ -249,8 +273,8 @@ under the License.

      -
    1. Time format for CarbonData - +
    2. Time format for CarbonData
    3. +
    @@ -267,9 +291,9 @@ under the License.
    - -
  • Dataload Configuration - +
      +
    1. Dataload Configuration
    2. +
    @@ -287,17 +311,17 @@ under the License. - + - + - + @@ -336,8 +360,9 @@ under the License.
    carbon.lock.type LOCALLOCKThis configuration specifies the type of lock to be acquired during concurrent operations on table. There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking.This configuration specifies the type of lock to be acquired during concurrent operations on table.There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other Carbon spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple carbon spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking.
    carbon.sort.intermediate.files.limit 20Minimum number of intermediate files after which merged sort can started.Minimum no of intermediate files after which sort merged to be started.
    carbon.block.meta.size.reserved.percentage 10Space reserved in percentage for writing block meta data in CarbonData file.space reserved in percentage for writing block meta data in carbon data file.
    carbon.csv.read.buffersize.byte
    -
  • -
  • Compaction Configuration +
      +
    1. Compaction Configuration
    2. +
    @@ -350,12 +375,12 @@ under the License. - + - + @@ -364,9 +389,9 @@ under the License.
    carbon.numberof.preserve.segments 0If the user wants to preserve some number of segments from being compacted then he can set this property. Example: carbon.numberof.preserve.segments=2 then 2 latest segments will always be excluded from the compaction. No segments will be preserved by default.If the user wants to preserve some number of segments from being compacted then he can set this property.Example: carbon.numberof.preserve.segments=2 then 2 latest segments will always be excluded from the compaction. No segments will be preserved by default.
    carbon.allowed.compaction.days 0Compaction will merge the segments which are loaded with in the specific number of days configured. Example: If the configuration is 2, then the segments which are loaded in the time frame of 2 days only will get merged. Segments which are loaded 2 days apart will not be merged. This is disabled by default.Compaction will merge the segments which are loaded with in the specific number of days configured.Example: if the configuration is 2, then the segments which are loaded in the time frame of 2 days only will get merged. Segments which are loaded 2 days apart will not be merged.This is disabled by default.
    carbon.enable.auto.load.merge
    -
  • -
  • Query Configuration - +
      +
    1. Query Configuration
    2. +
    @@ -388,8 +413,9 @@ under the License.
    -
  • -
  • Global Dictionary Configurations +
      +
    1. Global Dictionary Configurations
    2. +
    @@ -402,22 +428,22 @@ under the License.
    high.cardinality.identify.ena