Probably Dance

I can program and like games

Category: Programming

Faster Sorting Algorithm Part 2

This is a follow up to my previous blog post about writing a faster sorting algorithm. I’m using this as a chance to go into detail on topics that I was asked about in the comments: I’ll clear up some misunderstandings and go into future work that needs to happen with this algorithm.

Somebody was nice enough to link my blog post on Hacker News and Reddit. While I didn’t do that, I still read most of the comments on those website. For some reasons the comments I got on my website were much better than the comments on either of those websites. But there seem to be some common misunderstandings underlying the bad comments, so I’ll try to clear them up.

Read the rest of this entry »

random_seed_seq: A small utility to properly seed random number generators in C++

Some libraries are so small they’re almost not worth sharing. But the C++ standard has a giant hole in that it doesn’t provide an easy way to quickly generate truly random numbers: It has std::mt19937_64 which quickly generates pseudo-random numbers, and it has std::random_device, which slowly generates truly random numbers.

The easiest way to quickly generate truly random numbers is to use a std::random_device to seed a std::mt19937_64. That way we pay a one-time cost of using random device to generate a seed, and then have quick random numbers after that. Except that the standard doesn’t provide a way to do that. In fact it’s more dangerous than that: It provides an easy wrong way to do it (use a std::random_device to generate a single int and use that single int as the seed) and it provides a slow, slightly wrong way to do it. (use a std::random_device to fill a std::seed_seq and use that as the seed) There’s a proposal to fix this, (that link also contains reasons for why the existing methods are wrong) but I’ve actually been using a tiny class for this:

Read the rest of this entry »

I Wrote a Faster Sorting Algorithm

These days it’s a pretty bold claim if you say that you invented a sorting algorithm that’s 30% faster than state of the art. Unfortunately I have to make a far bolder claim: I wrote a sorting algorithm that’s twice as fast as std::sort for many inputs. And except when I specifically construct cases that hit my worst case, it is never slower than std::sort. (and even when I hit those worst cases, I detect them and automatically fall back to std::sort)

Why is that an unfortunate claim? Because I’ll probably have a hard time convincing you that I did speed up sorting by a factor of two. But this should turn out to be quite a lengthy blog post, and all the code is open source for you to try out on whatever your domain is. So I might either convince you with lots of arguments and measurements, or you can just try the algorithm yourself.

Following up from my last blog post, this is of course a version of radix sort. Meaning its complexity is lower than O(n log n). I made two contributions:

  1. I optimized the inner loop of in-place radix sort. I started off with the Wikipedia implementation of American Flag Sort and made some non-obvious improvements. This makes radix sort much faster than std::sort, even for a relatively small collections. (starting at 128 elements)
  2. I generalized in-place radix sort to work on arbitrary sized ints, floats, tuples, structs, vectors, arrays, strings etc. I can sort anything that is reachable with random access operators like operator[] or std::get. If you have custom structs, you just have to provide a function that can extract the key that you want to sort on. This is a trivial function which is less complicated than the comparison operator that you would have to write for std::sort.

If you just want to try the algorithm, jump ahead to the section “Source Code and Usage.”

Read the rest of this entry »

Investigating Radix Sort

I recently learned how radix sort works, and in hindsight it’s weird that I never really learned about it before, and that it doesn’t seem to be widely used. In this blog post I claim that std::sort should use radix sort for large arrays, and I will provide a simple implementation that does that.

But first an explanation of what radix sort is: Radix sort is a O(n) sorting algorithm working on integer keys. I’ll explain below how it works, but the claim that there’s an O(n) searching algorithm was surprising to me the first time that I heard it. I always thought there were proofs that sorting had to be O(n log n). Turns out sorting has to be O(n log n) if you use the comparison operator to sort. Radix sort does not use the comparison operator, and because of that it can be faster.

The other reason why I never looked into radix sort is that it only works on integer keys. Which is a huge limitation. Or so I thought. Turns out all this means is that your struct has to be able to provide something that acts somewhat like an integer. Radix sort can be extended to floats, pairs, tuples and std::array. So if your struct can provide for example a std::pair<bool, float> and use that as a sort key, you can sort it using radix sort.

Read the rest of this entry »

Lessons Learned from Shenzhen I/O

Shenzhen I/O is a brilliant game. In case you haven’t heard of it, it’s a game about programming micro-controllers. It distills programming down to the fun parts, removing the inertia, self-inflicted complexity, overhead, uncertainty and drag of real programming. It’s just about coming up with clever tiny algorithms and micro-optimizing the heck out of them. It’s great alone, but it’s even better if you have a friend that’s playing at the same time. Competing on the leaderboards for puzzles is enormous fun. From playing that game, here are a couple lessons:

1. There is no optimal code. There is only code that’s faster than the code that you’re comparing to

Shenzhen I/O shows you a histogram of all the scores that other people have reached. If my solution would fall on the right of the bell curve, I would optimize it until I was on the left. After a lot of work I would usually arrive at an “optimal” solution that puts me in the best bracket on the histogram. Those solutions were always far from optimal.

Read the rest of this entry »

C++11 Completed RAII, Making Composition Easier

The addition of move semantics in C++11 is not just a performance and safety improvement. It’s also the feature that completed RAII. And as of C++11 I believe that RAII is absolutely necessary to make object composition easy in the language.

To illustrate let’s look at how objects were composed before C++11, what problems we ran into, and how everything just works automatically since C++11. Let’s build an example of three objects:

struct Expensive
{
    std::vector<float> vec;
};
struct Group
{
    Group();
    Group(const Group &);
    Group & operator=(const Group &);
    ~Group();
    int i;
    float f;
    std::vector<Expensive *> e;
};
struct World
{
    World();
    World(const World &);
    World & operator=(const World &);
    ~World();
    std::vector<Group *> c;
};

Before C++11 composition looked something like this. It was OK to have a vector of floats, but you’d never have a vector of more expensive objects because any time that that vector re-allocates, you’d have a very expensive operation on your hand. So instead you’d write a vector of pointers. Let’s implement all those functions:

Read the rest of this entry »

Neural Networks Are Impressively Good At Compression

I’m trying to get into neural networks. There have been a couple big breakthroughs in the field in recent years and suddenly my side project of messing around with programming languages seemed short sighted. It almost seems like we’ll have real AI soon and I want to be working on that. While making my first couple steps into the field it’s hard to keep that enthusiasm. A lot of the field is still kinda handwavy where when you want to find out why something is used the way it’s used, the only answer you can get is “because it works like this and it doesn’t work if we change it.”

At least that’s my first impression. Still just dipping my toes in. But there is one thing I am very impressed with: How much data neural networks can express in how few connections.

Read the rest of this entry »

Functional Programming Is Not Popular Because It Is Weird

I’ve seen people be genuinely puzzled about why functional programming is not more popular. For example I’m currently reading “Out of the Tar Pit” where after arguing for functional programming the authors say

Still, the fact remains that such arguments have been insufficient to result in widespread adoption of functional programming. We must therefore conclude that the main weakness of functional programming is the flip side of its main strength – namely that problems arise when (as is often the case) the system to be built must maintain state of some kind.

I think the reason for the lack of popularity is much simpler: Writing functional code is often backwards and can feel more like solving puzzles than like explaining a process to the computer. In functional languages I often know what I want to say, but it feels like I have to solve a puzzle in order to express it to the language. Functional programming is just a bit too weird.

Read the rest of this entry »

Quickly Loading Things From Disk

Loading things from disk is a surprisingly unsolved problem in C++. If you have a class that you want to save to JSON or binary and load back later, you’ll probably end up writing custum parsing code. Solutions like Protobuf or Flatbuffers generate custom classes that are… how should I put it… bad code. They force you to work in a parallel type system that’s not quite as powerful as C++ and that doesn’t allow you to write custom member functions. You pretty much have to wrap anything that they generate. Boost serialization is a better solution in that it allows you to load and save existing classes, which allows you to keep your code quality higher, but it’s unable to write human readable files. The default implementation that it provides is also surprisingly slow. I blame it on the fact that it loads using std::istream.

Based on frustrations with existing solutions I decided to write something that can

  1. read and write existing, non-POD classes. No custom build step should be required
  2. write human readable formats like JSON
  3. read data very quickly (I care less about how fast it writes stuff. I work in an environment where there is a separate compile step for all data)

I came up with something that is faster than any of the libraries mentioned above.

Read the rest of this entry »

A surprisingly useful little class: TwoWayPointer

In graph like data structures (explicit or implicit) there is a problem of memory safety when you remove a node. You have to clean up all the connections pointing to you. If your graph structure is explicit this is easy, otherwise it can often be annoying to find all pointers pointing to you. A while ago I thought that you could solve that by introducing a two way pointer, which has to be the same size as a normal pointer, but which knows how to disconnect itself on the other side, so that you don’t have to write any clean up code.

The reason why it turned out surprisingly useful is that if both ends of the connection know about each other, I can make them keep the connection alive even if one of them is moved around. So if one of them lives in a std::vector and the vector reallocates, the TwoWayPointer can handle that case and just change the pointer on the other end to point to the new location. That can dramatically simplify code.

One example is that I’m now using it in my signal/slot class which I’ve also included in this blog post. It allowed me to simplify the signal/slot implementation a lot. Compare it to your favourite signal implementation if you want. There is usually a lot of code overhead associated with memory safety. That all went away in this implementation.

Read the rest of this entry »