hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lin Wen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAWQ-1597) Implement Runtime Filter for Hash Join
Date Wed, 14 Mar 2018 15:14:39 GMT
Lin Wen created HAWQ-1597:

             Summary: Implement Runtime Filter for Hash Join
                 Key: HAWQ-1597
                 URL: https://issues.apache.org/jira/browse/HAWQ-1597
             Project: Apache HAWQ
          Issue Type: New Feature
          Components: Query Execution
            Reporter: Lin Wen
            Assignee: Lei Chang

Bloom filter is a space-efficient probabilistic data structure invented in 1970, which is
used to test whether an element is a member of a set.
Nowdays, bloom filter is widely used in OLAP or data-intensive applications to quickly filter
data. It is usually implemented in OLAP systems for hash join. The basic idea is, when hash
join two tables, during the build phase, build a bloomfilter information for the inner table,
then push down this bloomfilter information to the scan of the outer table, so that, less
tuples from the outer table will be returned to hash join node and joined with hash table.
It can greatly improment the hash join performance if the selectivity is high.

This message was sent by Atlassian JIRA

View raw message