http://git-wip-us.apache.org/repos/asf/incubator-carbondata-site/blob/b28a17be/src/main/webapp/docs/latest/configuring.html ---------------------------------------------------------------------- diff --git a/src/main/webapp/docs/latest/configuring.html b/src/main/webapp/docs/latest/configuring.html index 338d016..f7c3abb 100644 --- a/src/main/webapp/docs/latest/configuring.html +++ b/src/main/webapp/docs/latest/configuring.html @@ -1,30 +1,61 @@ Untitled Document.md +

Version: 0.2.0 | Last Published: 21-11-2016

+ + Top +

Configuring CarbonData

-

This tutorial will guide you through the advance configurations of CarbonData :

+

This tutorial guides you through the advanced configurations of CarbonData :

System Configuration

-

This section provides the details of all the configurations required for Carbon System.
+

This section provides the details of all the configurations required for the CarbonData System.

System Configuration in carbon.properties

- + @@ -71,12 +107,12 @@ under the License. - + - + @@ -86,24 +122,23 @@ under the License. - + - +
ParameterProperty Default Value Description
carbon.storelocation /user/hive/warehouse/carbon.storeLocation where Carbon will create the store, and write the data in its own format.NOTE: Store location should be in HDFS.Location where CarbonData will create the store, and write the data in its own format. NOTE: Store location should be in HDFS.
carbon.ddl.base.hdfs.url hdfs://hacluster/opt/dataThis property is used to configure the HDFS relative path from the HDFS base path, configured in fs.defaultFS. The path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload.For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv,the path “hdfs://10.18.101.155:54310” will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url.Now while dataload user can specify the csv path as/2016/xyz.csv.This property is used to configure the HDFS relative path from the HDFS base path, configured in fs.defaultFS. The path configured in carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in fs.defaultFS. If this path is configured, then user need not pass the complete path while dataload. For example: If absolute path of the csv file is hdfs://10.18.101.155:54310/data/cnbc/2016/xyz.csv,the path “hdfs://10.18.101.155:54310” will come from property fs.defaultFS and user can configure the /data/cnbc/ as carbon.ddl.base.hdfs.url. Now while dataload user can specify the csv path as/2016/xyz.csv.
carbon.badRecords.location
carbon.kettle.home $SPARK_HOME/carbonlib/carbonpluginsPath used by Carbon internally to create graph for loading the data.Path used by CarbonData internally to create graph for loading the data.
carbon.data.file.version 2If this parameter value is set to1, then the Carbon supports the data load which is in old format. If the value is set to 2, then the Carbon supports the data load of new format only.NOTE: The file format created before DataSight Spark V100R002C30 is considered as old format.If this parameter value is set to 1, then CarbonData will support the data load which is in old format. If the value is set to 2, then CarbonData will support the data load of new format only. NOTE: The file format created before DataSight Spark V100R002C30 is considered as old format.

Performance Configuration

-

This section provides the details of all the configurations required for Carbon Performance Optimization.
+

This section provides the details of all the configurations required for CarbonData Performance Optimization.

Performance Configuration in carbon.properties

    -
  1. Data Loading Configuration
  2. -
+
  • Data Loading Configuration @@ -117,8 +152,8 @@ under the License. - - + + @@ -129,13 +164,13 @@ under the License. - + - + @@ -147,20 +182,20 @@ under the License. - + - - + + - - + + @@ -176,9 +211,10 @@ under the License.
    carbon.sort.file.buffer.size 20File read buffer size used during sorting.The value is in MB.Min=1 and Max=100File read buffer size used during sorting. This value is expressed in MB.Min=1 and Max=100
    carbon.graph.rowset.size
    carbon.number.of.cores.while.loading 6Number of cores to be used while data loading.Number of cores to be used while loading data.
    carbon.sort.size 500000Record count to sort and write to temp intermediate files.Record count to sort and write intermediate files to temp.
    carbon.number.of.cores.block.sort 7Number of cores to be used for block sort while dataloading.Number of cores to use for block sort while loading data.
    carbon.max.driver.lru.cache.size -1Max LRU cache size upto which data will be loaded at the driver side.The value is in MB. The default value is -1, means there is no memory limit for caching. Only integer values greater than 0 are accepted.Max LRU cache size upto which data will be loaded at the driver side. This value is expressed in MB. The default value is -1, means there is no memory limit for caching. Only integer values greater than 0 are accepted.
    carbon.max.executor.lru.cache.size -1Max LRU cache size upto which data will be loaded at the executor side.The value is in MB. The default value is -1, means there is no memory limit for caching. Only integer values greater than 0 are accepted. If this parameter is not configured, then thecarbon.max.driver.lru.cache.size value will be considered.Max LRU cache size upto which data will be loaded at the executor side. This value is expressed in MB. The default value is -1, means there is no memory limit for caching. Only integer values greater than 0 are accepted. If this parameter is not configured, then the carbon.max.driver.lru.cache.size value will be considered.
    carbon.merge.sort.prefetch
    -
      -
    1. Compaction Configuration
    2. -
    +
  • + +
  • Compaction Configuration + @@ -192,44 +228,43 @@ under the License. - + - - + + - + - + - + - +
    carbon.number.of.cores.while.compacting 2Number of cores which is used to write data during compaction.Number of cores which are used to write data during compaction.
    carbon.compaction.level.threshold4,3This property is for minor compaction which decides how many segments to be merged.Example: if it is set as 2,3 then minor compaction will be triggered for every 2 segments. 3 is the number of level 1 compacted segment which is further compacted to new segment.4, 3This property is for minor compaction which decides how many segments to be merged. Example: if it is set as 2,3 then minor compaction will be triggered for every 2 segments. 3 is the number of level 1 compacted segment which is further compacted to new segment. Valid values are from 0-100.
    carbon.major.compaction.size 1024Major compaction size can be configured using this parameter. Sum of the segments which is below this threshold will be merged. The value is in MB.Major compaction size can be configured using this parameter. Sum of the segments which is below this threshold will be merged. This value is expressed in MB.
    carbon.horizontal.compaction.enable trueThis property is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files becomes more than specified threshold. By default the horizontal compaction is Turned ON but can turn OFF the horizontal compaction by setting the value to false.This property is used to turn ON/OFF horizontal compaction. After every DELETE and UPDATE statement, horizontal compaction may occur in case the delta (DELETE/ UPDATE) files becomes more than specified threshold.
    carbon.horizontal.UPDATE.compaction.threshold 1 This property specifies the threshold limit on number of UPDATE delta files within a segment. In case the number of delta files goes beyond the threshold, the UPDATE delta files within the segment becomes eligible for horizontal compaction and compacted into single UPDATE delta file.By default the value is set to 1 and can be altered to values between 1 to 10000.Values between 1 to 10000.
    carbon.horizontal.DELETE.compaction.threshold 1 This property specifies the threshold limit on number of DELETE delta files within a block of a segment. In case the number of delta files goes beyond the threshold, the DELETE delta files for the particular block of the segment becomes eligible for horizontal compaction and compacted into single DELETE delta file.By default the value is set to 1 and can be altered to values between 1 to 10000.Values between 1 to 10000.
    -
      -
    1. Query Configuration
    2. -
    +
  • +
  • Query Configuration @@ -266,6 +301,8 @@ under the License.
    +
  • +

    Miscellaneous Configuration

    @@ -273,8 +310,8 @@ under the License.

      -
    1. Time format for CarbonData
    2. -
    +
  • Time format for CarbonData + @@ -291,9 +328,9 @@ under the License.
    -
      -
    1. Dataload Configuration
    2. -
    +
  • +
  • Dataload Configuration + @@ -311,17 +348,17 @@ under the License. - + - + - + @@ -360,9 +397,8 @@ under the License.
    carbon.lock.type LOCALLOCKThis configuration specifies the type of lock to be acquired during concurrent operations on table.There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other Carbon spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple carbon spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking.This configuration specifies the type of lock to be acquired during concurrent operations on table.There are following types of lock implementation: - LOCALLOCK: Lock is created on local file system as file. This lock is useful when only one spark driver (thrift server) runs on a machine and no other CarbonData spark application is launched concurrently. - HDFSLOCK: Lock is created on HDFS file system as file. This lock is useful when multiple CarbonData spark applications are launched and no ZooKeeper is running on cluster and HDFS supports file based locking.
    carbon.sort.intermediate.files.limit 20Minimum no of intermediate files after which sort merged to be started.Minimum number of intermediate files after which merged sort can started.
    carbon.block.meta.size.reserved.percentage 10space reserved in percentage for writing block meta data in carbon data file.Space reserved in percentage for writing block meta data in CarbonData file.
    carbon.csv.read.buffersize.byte
    -
      -
    1. Compaction Configuration
    2. -
    +
  • +
  • Compaction Configuration @@ -375,12 +411,12 @@ under the License. - + -
    carbon.numberof.preserve.segments 0If the user wants to preserve some number of segments from being compacted then he can set this property.Example: carbon.numberof.preserve.segments=2 then 2 latest segments will always be excluded from the compaction. No segments will be preserved by default.If the user wants to preserve some number of segments from being compacted then he can set this property. Example: carbon.numberof.preserve.segments=2 then 2 latest segments will always be excluded from the compaction. No segments will be preserved by default.
    carbon.allowed.compaction.days 0Compaction will merge the segments which are loaded with in the specific number of days configured.Example: if the configuration is 2, th