Looking through Guava source code the default pattern if you have an object whose equals method
depends on, say, 3 fields is to call Objects.hashCode(field0, field1, field2). This calls
Arrays.hashCode(Object[]), which produces the standard hash code for Java lists.
This is simple and I assume — since Guava has been thoroughly analyzed and tested — performs
well. I think we should do the same in Calcite.
To be clear, we are not looking for high performance hash functions. I am trying to mitigate
against new code getting a bad hash function because the developer could not discern what
was the project’s standard practice. Using Objects.hashCode will be good enough unless proven
otherwise.
Julian
> On Jan 30, 2016, at 9:06 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>
> The performance of larger prime multipliers seems intuitively better, but it isn't necessarily
so. If a better quick hash step is desired, a round of murmur would be much better.
>
> For instance, one of the constants in the murmur hash finalizer is even. Reading about
the weaknesses of murmur 2 is very instructive. The lesson is that picking numbers from thin
air isn't a great idea.
>
> Knuth also has some very good (if very old and dated) material on the topic.
>
> The first question is whether this matters. Is a significant chance of collision actually
even material?
>
> If yes, using a high quality hash like murmur3 should be tested. Especially if the hash
can be cached, the cost is often surprisingly low.
>
> Sent from my iPhone
>
>> On Jan 29, 2016, at 12:25, Julian Hyde <jhyde@apache.org> wrote:
>>
>> A larger prime would probably give better hashes (e.g. 524287 == 2 ^ 19 - 1) and
could still be computed using a single shift and subtract.
>>
>> A few hash functions such as Util.hash(int, int) still use just shift and xor. They
should definitely be fixed. I have logged https://issues.apache.org/jira/browse/CALCITE-1071.
|