carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jack...@apache.org
Subject carbondata git commit: [CARBONDATA-2790][BloomDataMap]Optimize default parameter for bloomfilter datamap
Date Wed, 01 Aug 2018 09:44:10 GMT
Repository: carbondata
Updated Branches:
  refs/heads/master c29aef880 -> 6351c3a07


[CARBONDATA-2790][BloomDataMap]Optimize default parameter for bloomfilter datamap

To provide better query performance for bloomfilter datamap by default,
we optimize bloom_size from 32000 to 640000 and optimize bloom_fpp from
0.01 to 0.00001.

This closes #2567


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/6351c3a0
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/6351c3a0
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/6351c3a0

Branch: refs/heads/master
Commit: 6351c3a077c0fa47390c4b30b05de7d830b387d1
Parents: c29aef8
Author: xuchuanyin <xuchuanyin@hust.edu.cn>
Authored: Fri Jul 27 11:54:21 2018 +0800
Committer: Jacky Li <jacky.likun@qq.com>
Committed: Wed Aug 1 17:43:49 2018 +0800

----------------------------------------------------------------------
 .../datamap/bloom/BloomCoarseGrainDataMapFactory.java          | 6 +++---
 docs/datamap/bloomfilter-datamap-guide.md                      | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/6351c3a0/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java
----------------------------------------------------------------------
diff --git a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java
b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java
index 652e1fc..80a86cc 100644
--- a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java
+++ b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java
@@ -69,15 +69,15 @@ public class BloomCoarseGrainDataMapFactory extends DataMapFactory<CoarseGrainDa
    * default size for bloom filter, cardinality of the column.
    */
   private static final int DEFAULT_BLOOM_FILTER_SIZE =
-      CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
+      CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT * 20;
   /**
    * property for fpp(false-positive-probability) of bloom filter
    */
   private static final String BLOOM_FPP = "bloom_fpp";
   /**
-   * default value for fpp of bloom filter is 1%
+   * default value for fpp of bloom filter is 0.001%
    */
-  private static final double DEFAULT_BLOOM_FILTER_FPP = 0.01d;
+  private static final double DEFAULT_BLOOM_FILTER_FPP = 0.00001d;
 
   /**
    * property for compressing bloom while saving to disk.

http://git-wip-us.apache.org/repos/asf/carbondata/blob/6351c3a0/docs/datamap/bloomfilter-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/bloomfilter-datamap-guide.md b/docs/datamap/bloomfilter-datamap-guide.md
index 325a508..2dba3dc 100644
--- a/docs/datamap/bloomfilter-datamap-guide.md
+++ b/docs/datamap/bloomfilter-datamap-guide.md
@@ -83,8 +83,8 @@ User can create BloomFilter datamap using the Create DataMap DDL:
 | Property | Is Required | Default Value | Description |
 |-------------|----------|--------|---------|
 | INDEX_COLUMNS | YES |  | Carbondata will generate BloomFilter index on these columns. Queries
on there columns are usually like 'COL = VAL'. |
-| BLOOM_SIZE | NO | 32000 | This value is internally used by BloomFilter as the number of
expected insertions, it will affects the size of BloomFilter index. Since each blocklet has
a BloomFilter here, so the value is the approximate records in a blocklet. In another word,
the value 32000 * #noOfPagesInBlocklet. The value should be an integer. |
-| BLOOM_FPP | NO | 0.01 | This value is internally used by BloomFilter as the False-Positive
Probability, it will affects the size of bloomfilter index as well as the number of hash functions
for the BloomFilter. The value should be in range (0, 1). |
+| BLOOM_SIZE | NO | 640000 | This value is internally used by BloomFilter as the number of
expected insertions, it will affects the size of BloomFilter index. Since each blocklet has
a BloomFilter here, so the default value is the approximate distinct index values in a blocklet
assuming that each blocklet contains 20 pages and each page contains 32000 records. The value
should be an integer. |
+| BLOOM_FPP | NO | 0.00001 | This value is internally used by BloomFilter as the False-Positive
Probability, it will affects the size of bloomfilter index as well as the number of hash functions
for the BloomFilter. The value should be in range (0, 1). In one test scenario, a 96GB TPCH
customer table with bloom_size=320000 and bloom_fpp=0.00001 will result in 18 false positive
samples. |
 | BLOOM_COMPRESS | NO | true | Whether to compress the BloomFilter index files. |
 
 


Mime
View raw message