we essentially get new random numbers in the range [0,15) for each rotation. The second number needs to fall in any of the 15 free slots out of 16 total, the third needs to fall in any of the 14 free slots out of 16 total and so on.

]]>n is the number of items with random hashes in the range [0,m], m is the number of slots, the probability for a collision is (m-1)/m*(m-2)/m*…*(m-n+1)/m which is approximately exp(-n*(n-1)/(2*m)) . The collisions scale inversely with the number of slots but to the square with the number of items.

In your scheme you could forgo the rotations and just try different uneven integer multiples, this would give you a larger space to search from. But it would still not get you far. You should look at more advanced perfect hashing schemes. Or, if you just have 8 values, take just a flat array, it’s faster than you think, testing 8 values is very quick. And for more than that there’s nothing wrong with a proper hash table.

Hi! Thanks for the correction; I should have been more precise 🙂 Since the comments don’t nest I’m replying to your last message actually. As I understand it, if the inputs in this case are rotationally symmetric (is that what you mean?), rotation of course fails to adequately break the symmetry. As an extreme example, if the inputs are the 64 integers with the “1” bit on each of the 64 places, the rotation step is totally useless. For this particular case, we’d have to enlarge the table size to 1024 (if working with powers of 2) when those keys start to map into distinct slots, which is super (16x) wasteful — compared to perfect and minimal hash function (the logarithm will do for this instance). Fortunately for my purpose (hashing pointers to static variables or long-living objects allocated in some pool), the memory allocator is not my adversary, hopefully!. The keys are clumped in some range but otherwise without “bad patterns”. In the end I tend to get away with it without enlarging the table by too much ^^

Even so, I can only use the rotation trick for small tables (typically < 16) — anything larger would have required a real perfect hash function or another data structure for efficient lookup.

But back to Malte’s original post, where the purpose is to map a real hash code into slots. I don’t know if an additional rotation step could help alleviating some of the “bad patterns” such as multiples of a Fibonacci number. I guess it may not worth the overhead except for some very specific cases.

P.S. when you wrote “The probability for success is 14750191/16777216”, how do you mean “probability”, and what is the random space? I’m afraid I’m not on the same page with you. How did you derive these numbers? Thanks!

]]>For each rotation. For 64 rotations, if they are independent, the probability for failure is 1.806797088433464E-59 . We can find a counter example by making some rotations degenerate by randomly choosing n and then shifting it to the different bits.

`n=rand(256);`

{n<<(3*0),n<<(8*1),n<<(8*2),n<<(8*3),...,n<<(8*7)}

This will mean only 8 out of the 64 rotations can be independent. Lowering the chance for chance for success to 99.8% .

You are not guaranteed to find that rotation at all. The probability for success is 14750191/16777216 .

]]>The header’s date says 2002, but before 2016 they used a prime factor which turned out not so well-conditioned. But the word “prime” in the macro stuck, hence the comments and compatibility macros defined in the beginning. So indeed we keep on rediscovering the technique, and it’s right now being used whenever Linux runs ^^

Curiously, they use the modular factor for (1 – φ) instead of φ as the multiplier, and the code comment suggests that (1 – φ) “is very slightly easier to multiply by”. I guess the reason is that the value is the smaller one, and the most significant nibble of the (1 – φ) factor is 0x6 instead of 0x9, therefore one bit shorter than the φ factor when written out as binary. Maybe this could have made it “slightly easier to multiply by” on platforms that lack native hardware support for 32/64-bit multiplication? Or maybe it’s just that the 64-bit factor ends in 0xB, so that they could spell out the literal as “0x…Bull” in the C code 🙂

I personally find the multiplicative hash by φ or (1 – φ) very effective when you want to hash pointer addresses into a small number of slots. Address values are not arbitrary, for the most significant bits tend to be fixed by the segment, and the least significant bits tend to be multiples of some common factors dictated by alignment requirements. The informative bits are likely in the middle, but there they could still be subject to patterns created by particularities of the memory allocator. By scrambling with a multiplier and taking the highest bits, these patterning effects tend to be reduced.

There’s another potentially useful technique if we want the the hash to be perfect or near-perfect for a small and fixed collection of input values: rotate first, then hash by golden ratio. The “ideal” number of bits to rotate by can be figured out by trial and error at setup time, provided that the input collection is small and the initial table leaves enough empty slots to allow for some slack. For instance, if we begin with 8 unique inputs and make the initial table size 16, I think we’re guaranteed to find a rotation-distance between 0 and 63 that put the 8 inputs into distinct slots among the 16 total. I imagine this as rotating the inputs such that their informative bits are juggled out of the places not adequately scrambled by the Fibonacci hash and put into its sweet spots, without losing any bit. To reuse one of your examples, the multiple of 144 (a Fibonacci number) by any number in [1..8] maps to 7 identically if we just take the highest 3 bits of the hash, but if we rotate the input by 3 bits first, the outputs nicely distribute to unique values in [0..7].

If the hardware supports it, the rotation arithmetic compiles down to one instruction. Compared with shift-and-xor, it saves one arithmetic but adds one load. I find it helpful for one specific job: building a small lookup table or “frozenset” for fixed constant pointers. Once built, for any non-zero input it answers the question “is this input among the elements in the set?” in constant time using no branching. If the input is also guaranteed to be a member and we want to do lookup, the keys don’t need to be stored.

Anyway, thank you so much for the post! I hope you’re doing well!

]]>It’s much better to over-vaccinate a few, than have a great number of people skip out on the vaccine incorrectly. It’s a much easier policy to apply and fulfill, as well, in terms of allocating shots to districts (you can just do it by population).

]]>