It’s sometimes annoying how often when people have critisims about “functional programming” they really translate to just critisisms about Haskell. I too find Haskell’s backward “where” clause annoying, but it’s perfectly possible to functional programming without such a syntactic construct, and indeed languages like OCaml don’t even have such a construct.

See below for baking a cake in a functional language without any “backward” steps:

let cake =

let dry_goods = whisked [flour; baking_soda; salt] in

let cake_mixture =

creamed ~until:`Fluffy [butter; white; sugar; brown sugar]

|> beat_mixed ~with_:eggs

|> mixed ~with_:bananas

|> mixed ~with_:(alternating [buttermilk; dry_goods])

|> mixed ~with_:(chopped walnuts) in

let oven = preheated ~at:175C oven in

let pan = prepare pan ~with_:[grease; flour] in

bake ~pan ~oven ~min:30 cake_mixture

|> remove_from_oven

|> cooled

You mean: von Neumann architectures use instructions. Surely not all possible definitions of computers have an imperative nature. There isn’t something inherent imperative about computers. But it is very natural for humans to manifest their will using imperatives, and programming is very much about imbuing your will into the nature around you. But it’s not the only way to think.

]]>Good luck trying to convince a layperson to bake that cake of yours.

]]>And here is another test, I put the following functions into a separate shared library. So people can’t complain that compiler is optimising static constants.

#include <stdint.h>

uint64_t mul_index(uint64_t hash_code, uint64_t bits)

{

const uint64_t gr = UINT64_C(11400714819323198485);

```
// Do some hash code randomisation
hash_code = (hash_code ^ (hash_code >> 30)) * UINT64_C(0xbf58476d1ce4e5b9);
hash_code = (hash_code ^ (hash_code >> 27)) * UINT64_C(0x94d049bb133111eb);
hash_code = hash_code ^ (hash_code >> 31);
return (hash_code * gr) >> (64 - bits);
```

}

uint64_t mod_index(uint64_t hash_code, uint64_t table_size)

{

return hash_code % table_size;

}

Measured time for a loop with 10,000,000 iterations for each function

Intel Xeon E5620 2.4GHz:

mul_index time = 91.76 msec

mod_index time = 157.84 msec

ARM Cortex-A72 1.5GHz:

mul_index time = 170.44 msec

mod_index time = 112.83 msec

The modulus function is slightly slower on Intel Xeon, but faster on ARM CPU. Feel free to run your own tests. On modern hardware, the speed of integer division is quite reasonable and not as horrible as some of you claim.

]]>Not quite, the results speak for themselves. Inserting 2 million integers into a hash table with 4 million buckets (only 50% usage) and you get:

Modulus hashing using identity function: 0 collisions with most sequential numbers.

Multiplicative hashing using identity function: between 200K and 1 million collision, depending on the sequence.

Multiplicative hashing using randomisation function: around 450K collisions.

The best you can get with multiplicative hashing is by applying a randomisation hashing function prior to calculating hash table index. This shuffles the bits in random order, but this is still 450K more collisions than with a modulus functions. And has the additional expense of randomisation code – multiplication, bit rotations and xor operations. After all that, a single modulus instruction doesn’t look that bad.

Yes sometimes input data is quite random, sometimes you need to randomise it to avoid hash table attacks, etc. And sometimes you don’t need any of that – you already have a unique sequence of integers and modulus by a prime number will give you the least number of collisions. The fact that this gives sequential index values is a desirable property and what results in so few collisions. If you use something like linear probing then this is not a very smart way of resolving collisions, there are better ways. There is no optimal hash table design for all use cases, you need to take measurements and match to a specific hash table design. All I’m saying is that multiplicative hashing is no silver bullet and on many occasions involving sequential number sets, prime number modulus hashing is a much better choice.

]]>I actually think the benchmark results are OK. Sure, at 2^22 = 4 million pointers (integers with stride 8) you start getting a few collisions. If you try the same test with 2^21 pointers, you get zero collisions.

If you try it with an actual hash function, you get more collisions: Roughly the 10% that 1yk0s mentioned, when I tried with std::_Hash_impl::hash(). (not officially supported, but it was an easy choice)

I still think on average this is pretty good and the performance benefits over integer modulo is still worth it. Going from 16ms to 24ms is not a small difference. (plus, you don’t quite measure the overhead of modulo correctly because presumably you won’t be using only one known prime number. You want to allow tables of different sizes. The compiler has a harder time optimizing when there are multiple possible constants)

The goal isn’t zero collisions. You won’t get that on any data that’s not regularly spaced. The goal is few collisions for all common patterns. And I mention one common pattern in the article where prime number modulo has problems: mostly sequential numbers. If your numbers are mostly sequential, meaning 1,2,3,4,…,10000, except occasionally you have other numbers in there like -20 or 2^22 or 2^22+4, that behaves really badly in prime number modulo, because the hash table will be densely packed and on collisions you have to search for a long time to find a free slot. Fibonacci hashing will spread these out, which gives you plenty of space to cheaply resolve hash collisions.

About the claim that the slower instructions are worth it because you get fewer collisions: Maybe. I’d like to see it on real data. Intuitively it shouldn’t be worth it because by far the most common case is that you immediately find the item you’re looking for in a hash table, so you want to optimize for that and use the fastest instructions possible on the happy path.

]]>You have access to my code, so you can make whatever changes you like that inhibit compiler optimizations and see for yourself. I tried various things and not noticed much difference on my hardware. Using identity function with prime hash table size is the key method that gives low collision rates. Depending on the hardware, modulus arithmetic can be relatively quick. There are other things happening in tandem and will cause CPU pipeline stalls, no matter how quick your indexing function is. In my experience, for non-random integer types that increase sequentially (file descriptors, pointer values, etc), using prime hash table sizes gives the best performance due to very low collision rates.

]]>Modulo will only have that (very nice!) property if you are hashing using the identity function (ie. no hash at all), which is a dangerous game to play.

Note that it’s not really the case that multiplicative hashing is doing particularly bad in this case; it’s doing fairly well evenly across almost all inputs. It’s just that the identity function + modulo does extremely well in this one specific case.

Your benchmark is misleading, by the way; you’re doing the modulo by a constant, which means the compiler can rewrite it into multiplicative form. This is only really possible if your hash table has a fixed number of buckets; otherwise, for non-constant, you will either have to have a real division, oor something like libdivide and/or a huge switch/case (like ska_flat_map). Changing that to a more universally usable variant is going to increase your (already big!) speed penalty, and/or code size.

]]>