drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] weijietong commented on issue #1662: DRILL-6825: apply different hash algorithms to different data types
Date Mon, 04 Mar 2019 12:33:10 GMT
weijietong commented on issue #1662: DRILL-6825: apply different hash algorithms to different
data types
URL: https://github.com/apache/drill/pull/1662#issuecomment-469236472
   The IntegerHashing's method was also used in ClickHouse for integer types(see: https://github.com/yandex/ClickHouse/blob/master/dbms/src/Common/HashTable/Hash.h
  intHash32 method). CK does a fine hashing method choosing according to the data types and
keys width which is valuable for us to learn. As you mentioned Murmur3Hash does not have a
good performance at the shorter integer case.So it's better to use the IntegerHash at the
integer keys case.
   The Boost implementation's discussion you mentioned I had read before. But I think it's
reasonable why Boost still keep the current implementation now as a base library. 
   The reason to keep seed away from the hash32 function and involve the Boost's hash_combine
method is that I want to change the current hashing strategy later. I plan to change the hash32(hash32(hash32))
row iterate model to `hash32() hash_combine hash32() hash_combine hash32()` column combine
model at the multi-keys case. The row iterate module has a data dependency and will hurt the
cpu pipeline performance.
   Other hashing methods I know can be found here: https://github.com/benalexau/hash-bench.
 It's a java hashing method collection. The benchmark I run showed that https://github.com/OpenHFT/Zero-Allocation-Hashing/blob/master/src/main/java/net/openhft/hashing/LongHashFunction.java
's city_1_1 has a best performance at 32,64 bytes key width.
   I also wonder whether we can do the join keys data type implication at the project node
later. So the HashJoin and Exchange node can also benefit from this PR.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message