Ideas for a Programming Language Part 4: Reactive Programming

by Malte Skarupke

Reactive programming has been around for a long time. It’s a paradigm oriented around data flows. The usual example is a spreadsheet: If cell B refers to cell A, then cell B updates automatically when I change the value in cell A. If ten cells refer to cell A, then all those ten cells change automatically. This is as opposed to C++, where if I compute ten variables using variable X, then if I change variable X, I have to manually write code to make sure that the other ten variables get updated.

What’s changed recently is that there are a few interesting approaches that make this paradigm more widely useful. Functional Reactive Programming (FRP) is a useful approach which is spreading, with popular libraries available for pretty much all languages. For an illustration of FRP and how it can be used in an imperative style I recommend reading Deprecating the Observer Pattern.

What really convinced me though that a lot more code should be reactive is a javascript library called Knockout. Which makes it so easy to write reactive code, you’d be stupid to ever write manual state propagation code.

The first time that I saw the method that Knockout is using was actually not in a library, but in a buildsystem called Sham. Sham is like make but without Makefiles. The way it knows what your dependencies are and when it has to rebuild things is that it intercepts all calls to fopen() in your process, and it remembers all files that you opened. The next time that you run the same build, it checks if any of those files changed and it only rebuilds if they have. And just like that you get all dependency tracking for free. You automatically get much more correct behavior about rebuilding than make has. It’s a brilliant little trick. (it took me a while to convince myself that it is correct behavior to just use the list of files that were opened the last time as dependencies for the next time, but it does always work)

Knockout figured out that you can do the same thing in code. Before talking about it though I’ll make you watch this introductory video to Knockout. Now think of how you would have written the code in there in your favorite UI framework. How many event handlers you would have and update functions and code to turn things on and off. There would probably be hundreds of lines of code that just push the state around. For example the point where he limits the friend list to five entries: He can write normal JavaScript code to explain this, and thanks to the dependency tracking the code gets re-evaluated only when it has to. You can do the same thing with events and callbacks, but it would be dozens of lines of code across several functions.

I would like to always work like this. Where I just write the behavior that I want and then somebody else figures out when my code has to run and when its input changed and who I have to notify when I change my outputs and all that. I also want that dependency graph. I want it visible in the IDE, walkable, with links to navigate to the functions. That graph of what variable depends on what other variable exists implicitly in your code, but you can’t see it. If you want to know how a variable changes, you’ll probably have to walk over to the person who wrote that code. (or if that person doesn’t work at your company any more, you’ll have to read or step through the code) With Knockouts dependency tracking you could make the graph of your state propagation visible.

There are a few problems though: 1. There is a bit of overhead for all of this dependency tracking. It’s suprisingly little overhead, but it’s not free. I’d say you can run this in 90% of your code, but you want to keep it out of your inner loops. And 2. If you implement this in C++, the dependent observables all become lambda functions that you have to store in a std::function. So now if you have a

struct Foo
{
    float some_computed_value;
};

you instead have to store it as a

struct Foo
{
    float some_computed_value;
    std::function<float ()> update_function;
};

because the update function for the dependent observable has to be stored somewhere. Except that that update function will never change. In fact it’s known at compile time. So if you had compiler support, it could just go away. So with compiler support this idea not only can become a normal part of the language, it would also have less overhead than a library based solution. I’m not the only person to have this idea. Sean Parent talks about the same thing at the end of this talk which is about std::future. (slides here, because you can’t read them in the video) It’s interesting that he comes to the same conclusion from a very different angle. He comes at it from multithreading and from chaining together futures. Where he wants the graph for which tasks in his task system will launch which other tasks or wait for other tasks. Which is really the same thing as the dependency graph from Knockout.

I think the idea is strong. The compiler should know which variables depend on which other variables. And Knockout has shown that in cases where the compiler can’t know, you can actually track all of this at runtime. And when you implement it in C++, it’s surprising how little overhead there is for this. I think this will change how we program. After only briefly working with Knockout I already groan every time that I have to push some state from one object to some other object.

So this will be a core piece of my language. And I can think of at least ten problems that I’m going to run into that I haven’t quite solved yet, but I think it’s worth it. I think it will simplify coding a lot while also making our code less buggy and faster at the same time.

 

Mostly unrelated to this, here is a different topic: I have a feeling that we are writing code at one level of indirection away from the level where we should be writing it.

I have started working on this language, but I haven’t started to write a compiler yet. Instead I am investigating two related topics: Clojures Transducers and Haskells Lenses. They are strongly related to C++ iterators and ranges in that they are trying to separate the algorithms that we perform on data from the data structures itself. I think they both invented nearly the same thing, but I have to play around more with them to be sure. Instead I’ll talk about something that I noticed while working on this: A lot of code looks like this:

std::transform(vec.begin(), vec.end(), std::back_inserter(ovec), [](int a){ return a + 5; });
ovec.erase(std::remove_if(ovec.begin(), ovec.end(), [](int a){ return a > 10; }), ovec.end());

or

std::async(&foo).then([](std::future<std::string> value)
{
    return bar(value.get());
}).then([&lines](std::future<std::string> value)
{
    std::string s = value.get();
    boost::char_separator<char> sep("\n\r");
    boost::tokenizer<boost::char_separator<char>> line_separator(s, sep);
    lines.insert(lines.end(), line_separator.begin(), line_separator.end());
});

Where these are trivialized examples, but you get the idea. The transducers talk made me realize that these are actually the same thing. But what I want to complain about is that the actual behavior here is in that lambda on the right hand side. The actual behavior here is “a + 5” or “a > 10” or calling the function foo() followed by the function bar() followed by comparing against newline characters. But there is all that stuff around what I’m actually doing which is just overhead for dealing with the concrete containers that we are using. I would like the “a + 5” and the “a > 10” to be the main code, and I want the containers to be the detail. Where right now the containers are the main statements, and what this code actually does is a sub statement.

I know a few people where if I made this complaint to them, they would laugh kindly, and tell me in the nicest words how silly I am for making this so complicated. If you just use a for loop, this becomes normal code:

for (size_t i = 0, size = vec.size(); i < size; ++i)
{
    int tmp = vec[i];
    tmp += 5;
    if (tmp > 10)
        ovec.push_back(tmp);
}

This is what I’m looking for. But there are two problems with this: The first problem is that it’s a special case. If all we want to do is iterate over every element in a container, we can write normal code like this. But it’s terribly difficult to write reusable code with for loops. I often see people re-implement std::remove_if() or std::rotate() because you start with a for loop, and it’s not clear how to transition from a for loop to using a reusable algorithm. In fact you almost certainly end up with something more complicated and slower than std::remove_if() or std::rotate() because a for loop is mostly good for iterating over every element. It’s not ideal for writing even slightly more complicated algorithms.

And obviously the for loop doesn’t apply to chaining futures.

So I think we’re writing code at the wrong level. There are so many details here that it’s difficult to read what the code actually does. I don’t quite know yet what the right level looks like. But what I want is that if you write single threaded code in a normal function where you just call X first then Y and maybe do a few if/elses, that should look the same as if you did the same operations on an array, and it should look the same as if you did the same operations on futures. I think Transducers and Lenses promise to solve this problem. I have to spend more time with them. I have a hunch that one of these with some heavy syntactic sugar might be what I’m looking for.