calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: hashCode functions using multiply method
Date Tue, 02 Feb 2016 19:36:44 GMT
Looking through Guava source code the default pattern if you have an object whose equals method
depends on, say, 3 fields is to call Objects.hashCode(field0, field1, field2). This calls
Arrays.hashCode(Object[]), which produces the standard hash code for Java lists.

This is simple and I assume — since Guava has been thoroughly analyzed and tested — performs
well. I think we should do the same in Calcite.

To be clear, we are not looking for high performance hash functions. I am trying to mitigate
against new code getting a bad hash function because the developer could not discern what
was the project’s standard practice. Using Objects.hashCode will be good enough unless proven
otherwise.

Julian


> On Jan 30, 2016, at 9:06 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> 
> 
> The performance of larger prime multipliers seems intuitively better, but it isn't necessarily
so.  If a better quick hash step is desired, a round of murmur would be much better. 
> 
> For instance, one of the constants in the murmur hash finalizer is even. Reading about
the weaknesses of murmur 2 is very instructive. The lesson is that picking numbers from thin
air isn't a great idea. 
> 
> Knuth also has some very good (if very old and dated) material on the topic. 
> 
> The first question is whether this matters. Is a significant chance of collision actually
even material?
> 
> If yes, using a high quality hash like murmur3 should be tested.  Especially if the hash
can be cached, the cost is often surprisingly low. 
> 
> Sent from my iPhone
> 
>> On Jan 29, 2016, at 12:25, Julian Hyde <jhyde@apache.org> wrote:
>> 
>> A larger prime would probably give better hashes (e.g. 524287 == 2 ^ 19 - 1) and
could still be computed using a single shift and subtract.
>> 
>> A few hash functions such as Util.hash(int, int) still use just shift and xor. They
should definitely be fixed. I have logged https://issues.apache.org/jira/browse/CALCITE-1071.


Mime
View raw message