PS It makes no sense to use xxHash as a mixing function. 1) it is not a permutation, so it can have collisions, and 2) it is a byte-at-a-time hash function, designed for variable-length strings, not fixed-width integers.

PPS Robin Hood hashing isn’t quite the final word on linear probing. There’s a couple of other linear probing variants I’m currently benchmarking, and one of them is clearly superior to Robin Hood (and predates it by over 20 years)…

]]>In that case it is very obvious that it’s slower than a multiply + shift, but there might be “good” primes that you can calculate division (mod 2^64) quite fast for.

]]>So to run through an example, let’s say

one key/value pair in the map is 16 bytes in size,

there are 48 items currently in the map,

and the load_factor is 0.75.

Meaning the table is 75% full and the internal allocation has enough space for 64 items. The map will add one byte overhead, to arrive at 17 bytes per entry. So in this case memory usage would be (48*(16 + 1))/0.75 = 1088.

That should be the size of the heap allocation. There will be some more memory for the map itself. I’m not sure exactly how much right now, but it should be less than 100 bytes. You can find out by just taking sizeof(ska::bytell_hash_map) with your key and value pair. Also note that the table will re-allocate before it’s 100% full. By default it re-allocates when it’s 93% full. It does this because when the table gets too full, performance suffers. And sometimes it has to re-allocate before that if there are too many hash collisions. That should be rare, but it might be a problem if your performance depends on the table not resizing to be bigger than the L2 cache. Just keep an eye out for how often this happens to you.

Oh and for my other hash table, flat_hash_map, the memory usage should be (size() * (sizeof(T) + alignof(T)) / load_factor(). That one only has a max_load_factor() of 0.5 by default, so it re-allocates when it’s only half full. You can increase that, but the table does get slower when you do. Increasing it to 0.75 should still be OK, but 0.9 might be too high. Once you get past a certain load_factor, things get very slow very quickly. I picked 0.5 by default because that gives stable performance: There isn’t too much of a difference between a table that’s just about to re-allocate and a table that’s almost empty. At 0.75 there will be more of a difference, at 0.9 there will be a big difference between an empty table and a table that’s about to re-allocate.

]]>first of all thanks for not only posting your hash tables but also for discussing and explaining them in detail.

We are currently evaluating several hash tables in our research database Hyrise. One operation uses radix clustering to partition data in way that each hash table fits into the L2 cache.

Currently, we are struggling trying to estimate the memory consumption of the bytell hash table. We know the number of elements, the data type and so on. Is there a rule of thumb for the size of your hash maps?

Best

Martin

Which hash function are you using to do the comparison between the hash tables? Could the performance difference in the benchmark be affected by the hash function choice?

I’m saying that because I too implemented the hash table they describe in the CppCon presentation, but then I went to spend a few weeks trying out hash functions just to realize that topic can be more complex than the hash table itself.

]]>Note that successive truncations of 2 +1/(2 +1/(2 +…)) are 2, 5/2, 12/5, 29/12, 70/29, 169/70, 408/169, … so it’s the denominators showing up here that are apt to give trouble for 2^64 *(sqrt(2) -1)

]]>You’re using hash(x) = (x*k)%N with k = (r*M)%M for some irrational r, M = 2^64 and N as some smaller power of two. If it weren’t for rounding, hash(x) would be (x*r*M)%N and, for some rational m/n approximation to r, hash(i*n) is close to (i*m*M)%N, which is zero; the “close to” difference is (i*(n*r -m)*M)%N and some rounding errors; if m/n is a good approximation, this is small and you get the zeros you saw with larger Fibonacci numbers. At least, that’s what I think is happening …

One way to consider rational approximation of an irrational is to split the irrational into whole number and a fractional part, then apply the same method to approximate the fractional part; go a few steps, approximate by a whole number, then unroll back to where you started. Thus, for pi we get pi = 3.1415…, 1/(pi – 3) = 7.0625…, 1/(1/(pi-3) -7) = 15.99…, let’s call that 16; so 1/16 = 1/(pi-3) -7, pi = 3 +1/(7 +1/16) = 3 +16/(16*7 +1) = 3 +16/113, good to three parts in ten million; and I only went a few steps in. One can slightly improve this by, rather than a whole-and-fractional split, using the nearest whole number and a +/- fractional error, so that the error is never more than a half (where 15.99… would have had error .99…) and the whole number is never (after the first step) less than two. This approach truncates a “continued fraction” approximation to the irrational, pi = 3 +1/(7 +1/(16 -1/(294 -1/(3 -1/…)))) in order to obtain a good rational approximation.

You can do the same with the golden ratio, of course. It’s the solution to x*x = x +1, so I can tell you right away that x = 1 +1/x = 1 +1/(1 +1/x) = 1 +1/(1 +1/(1 +1/(1 +1/…))). That’s using the simple whole+fraction split; if we allow negatives, we have x -1 = 1/x, so x/(x -1) = x*x = x +1; with q = 1/(2 -x) = 1/(1 -1/x) = x/(x -1) = x +1, we get 1/(3 -q) = 1/(2 -x) = q, whence q = 3 -1/q = 3 -1/(3 -1/(3 -…)) and x = 2 -1/q = 2 -1/(3 -1/(3 -1/(3 -1/…))) and it’s threes all the way down. This means you never get a nice big number in the sequence (like our sudden leap to 294 in pi’s continued fraction), for which the 1/… you’ve got to add to it or subtract from it is a small fractional difference, making that a good place to truncate. So rational approximations to the golden ratio improve painfully slowly – which is a PITA if you want a good rational approximation, but a blessing when you’re trying to pick an irrational to use in your multiplicative hashing. When we truncate 1 +1/(1 +1/…) before successive +s, we get 1/1, 2/1, 3/2, 5/3, 8/5, 13/8, …, the ratios of successive Fibonacci numbers. Truncating 2 -1/(3 -1/(3 -…)) before successive -s, we get 2/1, 5/3, 13/8, 34/21, …, selecting every second entry from the previous sequence; allowing subtraction doubles our speed of refinement, but it’s still painfully slow.

So can we find something more pathological than the golden ratio ? Well, the only credible candidate is 2 +1/(2 +1/(2 +1/…)) = z = 2 +1/z so 1 = z*z -2*z = (z -1)^2 -1 and z -1 = sqrt(2) so z = 1 +sqrt(2) = 1/(sqrt(2) -1). So maybe give that a try – obviously, you’ll be dividing by it, using r = sqrt(2) -1 in my analysis above. I’d be interested to know how well it fares as a hash ;^)

]]>