Probably Dance

I can program and like games

Month: December, 2016

random_seed_seq: A small utility to properly seed random number generators in C++

Some libraries are so small they’re almost not worth sharing. But the C++ standard has a giant hole in that it doesn’t provide an easy way to quickly generate truly random numbers: It has std::mt19937_64 which quickly generates pseudo-random numbers, and it has std::random_device, which slowly generates truly random numbers.

The easiest way to quickly generate truly random numbers is to use a std::random_device to seed a std::mt19937_64. That way we pay a one-time cost of using random device to generate a seed, and then have quick random numbers after that. Except that the standard doesn’t provide a way to do that. In fact it’s more dangerous than that: It provides an easy wrong way to do it (use a std::random_device to generate a single int and use that single int as the seed) and it provides a slow, slightly wrong way to do it. (use a std::random_device to fill a std::seed_seq and use that as the seed) There’s a proposal to fix this, (that link also contains reasons for why the existing methods are wrong) but I’ve actually been using a tiny class for this:

Read the rest of this entry »

I Wrote a Faster Sorting Algorithm

These days it’s a pretty bold claim if you say that you invented a sorting algorithm that’s 30% faster than state of the art. Unfortunately I have to make a far bolder claim: I wrote a sorting algorithm that’s twice as fast as std::sort for many inputs. And except when I specifically construct cases that hit my worst case, it is never slower than std::sort. (and even when I hit those worst cases, I detect them and automatically fall back to std::sort)

Why is that an unfortunate claim? Because I’ll probably have a hard time convincing you that I did speed up sorting by a factor of two. But this should turn out to be quite a lengthy blog post, and all the code is open source for you to try out on whatever your domain is. So I might either convince you with lots of arguments and measurements, or you can just try the algorithm yourself.

Following up from my last blog post, this is of course a version of radix sort. Meaning its complexity is lower than O(n log n). I made two contributions:

  1. I optimized the inner loop of in-place radix sort. I started off with the Wikipedia implementation of American Flag Sort and made some non-obvious improvements. This makes radix sort much faster than std::sort, even for a relatively small collections. (starting at 128 elements)
  2. I generalized in-place radix sort to work on arbitrary sized ints, floats, tuples, structs, vectors, arrays, strings etc. I can sort anything that is reachable with random access operators like operator[] or std::get. If you have custom structs, you just have to provide a function that can extract the key that you want to sort on. This is a trivial function which is less complicated than the comparison operator that you would have to write for std::sort.

If you just want to try the algorithm, jump ahead to the section “Source Code and Usage.”

Read the rest of this entry »

Investigating Radix Sort

I recently learned how radix sort works, and in hindsight it’s weird that I never really learned about it before, and that it doesn’t seem to be widely used. In this blog post I claim that std::sort should use radix sort for large arrays, and I will provide a simple implementation that does that.

But first an explanation of what radix sort is: Radix sort is a O(n) sorting algorithm working on integer keys. I’ll explain below how it works, but the claim that there’s an O(n) searching algorithm was surprising to me the first time that I heard it. I always thought there were proofs that sorting had to be O(n log n). Turns out sorting has to be O(n log n) if you use the comparison operator to sort. Radix sort does not use the comparison operator, and because of that it can be faster.

The other reason why I never looked into radix sort is that it only works on integer keys. Which is a huge limitation. Or so I thought. Turns out all this means is that your struct has to be able to provide something that acts somewhat like an integer. Radix sort can be extended to floats, pairs, tuples and std::array. So if your struct can provide for example a std::pair<bool, float> and use that as a sort key, you can sort it using radix sort.

Read the rest of this entry »