drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] Ben-Zvi edited a comment on issue #1662: DRILL-6825: apply different hash algorithms to different data types
Date Sat, 02 Mar 2019 07:12:52 GMT
Ben-Zvi edited a comment on issue #1662: DRILL-6825: apply different hash algorithms to different
data types
URL: https://github.com/apache/drill/pull/1662#issuecomment-468894618
 
 
   Adding the hash32() method to the ValueVector is useful; however picking up algorithms
just based on a paper or being famous may not be good enough.  At my previous employer I evaluated
many hash functions by actually running (stand alone) performance and distribution tests.
One clear result shown back then is that murmur performed well on long strings, less good
on shorter data.
   
      How do the new hash functions in IntegerHashing compare with the existing one in HashHelper
?
   
      The Boost implementation of hash_combine looks "fishy" (e.g., some bits get more used
than others) -- see some more critique at  https://stackoverflow.com/questions/35985960/c-why-is-boosthash-combine-the-best-way-to-combine-hash-values
 
   
     Why can't the seed be given directly to the hash function instead of being "combined"
later ?
   
      Another good hash function used in the past (don't recall any name) worked with a map
of 256 prime numbers, and the code (starting with the seed) was using each  input byte as
an index to the map - rotate old value, XOR with new mapped value, continue ....
   
      Now things may perform differently in Java.  
      Also - do you know of any open source hash functions we can just import instead of writing
the code in Drill ?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message