A new fast hash table in response to Google’s new fast hash table

Hi, I wrote my new favorite hash table. This came about because last year I wrote the fastest hash table (I still make that claim) and this year one of the organizers of the C++Now conference asked me to give a talk. My problem was that Google had also announced a new fast hash table last year, and I wasn’t sure if mine would compare well against theirs.

The main benefit of Google’s hash table over mine was that Google’s has less memory overhead: It has a higher max_load_factor (meaning how full can the table get before it grows to a bigger array) and it has only 1 byte overhead per entry, where the overhead of my table depended on the alignment of your data. (if your data is 8 byte aligned, you’ll have 8 bytes overhead)

So I spent months working on that conference talk, trying to find something that would be a good response to Google’s hash table. Surprisingly enough I ended up with a chaining hash table that is almost as fast as my hash table from last year, while having even less memory overhead than Google’s hash table and which has this really nice property of having stable performance: Every hash table has some performance pitfalls, but this one has fewer than most and will cause problems less often than others will. So what that does is that it’s a hash table that’s really easy to recommend.

Read the rest of this entry »