And here is another test, I put the following functions into a separate shared library. So people can’t complain that compiler is optimising static constants.

#include <stdint.h>

uint64_t mul_index(uint64_t hash_code, uint64_t bits)

{

const uint64_t gr = UINT64_C(11400714819323198485);

```
// Do some hash code randomisation
hash_code = (hash_code ^ (hash_code >> 30)) * UINT64_C(0xbf58476d1ce4e5b9);
hash_code = (hash_code ^ (hash_code >> 27)) * UINT64_C(0x94d049bb133111eb);
hash_code = hash_code ^ (hash_code >> 31);
return (hash_code * gr) >> (64 - bits);
```

}

uint64_t mod_index(uint64_t hash_code, uint64_t table_size)

{

return hash_code % table_size;

}

Measured time for a loop with 10,000,000 iterations for each function

Intel Xeon E5620 2.4GHz:

mul_index time = 91.76 msec

mod_index time = 157.84 msec

ARM Cortex-A72 1.5GHz:

mul_index time = 170.44 msec

mod_index time = 112.83 msec

The modulus function is slightly slower on Intel Xeon, but faster on ARM CPU. Feel free to run your own tests. On modern hardware, the speed of integer division is quite reasonable and not as horrible as some of you claim.

]]>Not quite, the results speak for themselves. Inserting 2 million integers into a hash table with 4 million buckets (only 50% usage) and you get:

Modulus hashing using identity function: 0 collisions with most sequential numbers.

Multiplicative hashing using identity function: between 200K and 1 million collision, depending on the sequence.

Multiplicative hashing using randomisation function: around 450K collisions.

The best you can get with multiplicative hashing is by applying a randomisation hashing function prior to calculating hash table index. This shuffles the bits in random order, but this is still 450K more collisions than with a modulus functions. And has the additional expense of randomisation code – multiplication, bit rotations and xor operations. After all that, a single modulus instruction doesn’t look that bad.

Yes sometimes input data is quite random, sometimes you need to randomise it to avoid hash table attacks, etc. And sometimes you don’t need any of that – you already have a unique sequence of integers and modulus by a prime number will give you the least number of collisions. The fact that this gives sequential index values is a desirable property and what results in so few collisions. If you use something like linear probing then this is not a very smart way of resolving collisions, there are better ways. There is no optimal hash table design for all use cases, you need to take measurements and match to a specific hash table design. All I’m saying is that multiplicative hashing is no silver bullet and on many occasions involving sequential number sets, prime number modulus hashing is a much better choice.

]]>I actually think the benchmark results are OK. Sure, at 2^22 = 4 million pointers (integers with stride 8) you start getting a few collisions. If you try the same test with 2^21 pointers, you get zero collisions.

If you try it with an actual hash function, you get more collisions: Roughly the 10% that 1yk0s mentioned, when I tried with std::_Hash_impl::hash(). (not officially supported, but it was an easy choice)

I still think on average this is pretty good and the performance benefits over integer modulo is still worth it. Going from 16ms to 24ms is not a small difference. (plus, you don’t quite measure the overhead of modulo correctly because presumably you won’t be using only one known prime number. You want to allow tables of different sizes. The compiler has a harder time optimizing when there are multiple possible constants)

The goal isn’t zero collisions. You won’t get that on any data that’s not regularly spaced. The goal is few collisions for all common patterns. And I mention one common pattern in the article where prime number modulo has problems: mostly sequential numbers. If your numbers are mostly sequential, meaning 1,2,3,4,…,10000, except occasionally you have other numbers in there like -20 or 2^22 or 2^22+4, that behaves really badly in prime number modulo, because the hash table will be densely packed and on collisions you have to search for a long time to find a free slot. Fibonacci hashing will spread these out, which gives you plenty of space to cheaply resolve hash collisions.

About the claim that the slower instructions are worth it because you get fewer collisions: Maybe. I’d like to see it on real data. Intuitively it shouldn’t be worth it because by far the most common case is that you immediately find the item you’re looking for in a hash table, so you want to optimize for that and use the fastest instructions possible on the happy path.

]]>You have access to my code, so you can make whatever changes you like that inhibit compiler optimizations and see for yourself. I tried various things and not noticed much difference on my hardware. Using identity function with prime hash table size is the key method that gives low collision rates. Depending on the hardware, modulus arithmetic can be relatively quick. There are other things happening in tandem and will cause CPU pipeline stalls, no matter how quick your indexing function is. In my experience, for non-random integer types that increase sequentially (file descriptors, pointer values, etc), using prime hash table sizes gives the best performance due to very low collision rates.

]]>Modulo will only have that (very nice!) property if you are hashing using the identity function (ie. no hash at all), which is a dangerous game to play.

Note that it’s not really the case that multiplicative hashing is doing particularly bad in this case; it’s doing fairly well evenly across almost all inputs. It’s just that the identity function + modulo does extremely well in this one specific case.

Your benchmark is misleading, by the way; you’re doing the modulo by a constant, which means the compiler can rewrite it into multiplicative form. This is only really possible if your hash table has a fixed number of buckets; otherwise, for non-constant, you will either have to have a real division, oor something like libdivide and/or a huge switch/case (like ska_flat_map). Changing that to a more universally usable variant is going to increase your (already big!) speed penalty, and/or code size.

]]>I could not have said it better myself, Steinar H. Gunderson. I just want to add one thing. If you have well distributed values and a 50% load factor the expected number of collisions is about 10.6% with respect to the number of buckets (equivalent to 21.3% w.r.t number of elements).

https://stackoverflow.com/questions/9104504/expected-number-of-hash-collisions#11362027

If you think you can do better you might be able to create a perfect hash function specifically for your data.

I think you’re missing a point. Quite often input data is not completely random and follows a specific pattern. A good hash table should handle a variety of different input data efficiently. Using modulus arithmetic with a prime number will give you the least number of collisions with many different patterns. It just works for all the usual cases. There are many patterns where using power of 2 hash table is quite sub-optimal due to high number of collisions.

]]>*Any* hash function will give large amounts of collision for “certain data sets”. For the very specific case of a super-dense range of pointers whose range happens to fit exactly into the hash table, prime modulo will do much better than a multiplicative hash. Likewise, if you have modulo with a prime p, hashing integers that are exactly p apart will give you catastrophic amounts of collisions. But if either is a very important case for you, perhaps you shouldn’t be using hashing in the first place.

It doesn’t matter what hash function you use, when it comes to multiplicative hashing for mapping hash code to hash table index, you will get large number of collisions for certain data sets. Here is a link to my code, plug in whatever hash function you like:

https://drive.google.com/file/d/1Qphu8JfZDZ9CBfIxk5jqAT8fqviPjBRv/view

Have a look at my answer on stackoverflow here:

Or use splitmix64:

uint64_t hash(uint64_t x) { x = (x ^ (x >> 30)) * UINT64_C(0xbf58476d1ce4e5b9); x = (x ^ (x >> 27)) * UINT64_C(0x94d049bb133111eb); x = x ^ (x >> 31); return x; }

For a well distributed hash function. But then you’ll say this is two multiplications, and you’d be correct. But even one multiplication is often sufficient, and then we have arrived exactly where Malte had.

]]>