Ideas for a Programming Language Part 4: Reactive Programming
by Malte Skarupke
Reactive programming has been around for a long time. It’s a paradigm oriented around data flows. The usual example is a spreadsheet: If cell B refers to cell A, then cell B updates automatically when I change the value in cell A. If ten cells refer to cell A, then all those ten cells change automatically. This is as opposed to C++, where if I compute ten variables using variable X, then if I change variable X, I have to manually write code to make sure that the other ten variables get updated.
What’s changed recently is that there are a few interesting approaches that make this paradigm more widely useful. Functional Reactive Programming (FRP) is a useful approach which is spreading, with popular libraries available for pretty much all languages. For an illustration of FRP and how it can be used in an imperative style I recommend reading Deprecating the Observer Pattern.
What really convinced me though that a lot more code should be reactive is a javascript library called Knockout. Which makes it so easy to write reactive code, you’d be stupid to ever write manual state propagation code.
The first time that I saw the method that Knockout is using was actually not in a library, but in a buildsystem called Sham. Sham is like make but without Makefiles. The way it knows what your dependencies are and when it has to rebuild things is that it intercepts all calls to fopen() in your process, and it remembers all files that you opened. The next time that you run the same build, it checks if any of those files changed and it only rebuilds if they have. And just like that you get all dependency tracking for free. You automatically get much more correct behavior about rebuilding than make has. It’s a brilliant little trick. (it took me a while to convince myself that it is correct behavior to just use the list of files that were opened the last time as dependencies for the next time, but it does always work)
Knockout figured out that you can do the same thing in code. Before talking about it though I’ll make you watch this introductory video to Knockout. Now think of how you would have written the code in there in your favorite UI framework. How many event handlers you would have and update functions and code to turn things on and off. There would probably be hundreds of lines of code that just push the state around. For example the point where he limits the friend list to five entries: He can write normal JavaScript code to explain this, and thanks to the dependency tracking the code gets re-evaluated only when it has to. You can do the same thing with events and callbacks, but it would be dozens of lines of code across several functions.
I would like to always work like this. Where I just write the behavior that I want and then somebody else figures out when my code has to run and when its input changed and who I have to notify when I change my outputs and all that. I also want that dependency graph. I want it visible in the IDE, walkable, with links to navigate to the functions. That graph of what variable depends on what other variable exists implicitly in your code, but you can’t see it. If you want to know how a variable changes, you’ll probably have to walk over to the person who wrote that code. (or if that person doesn’t work at your company any more, you’ll have to read or step through the code) With Knockouts dependency tracking you could make the graph of your state propagation visible.
There are a few problems though: 1. There is a bit of overhead for all of this dependency tracking. It’s suprisingly little overhead, but it’s not free. I’d say you can run this in 90% of your code, but you want to keep it out of your inner loops. And 2. If you implement this in C++, the dependent observables all become lambda functions that you have to store in a std::function. So now if you have a
struct Foo { float some_computed_value; };
you instead have to store it as a
struct Foo { float some_computed_value; std::function<float ()> update_function; };
because the update function for the dependent observable has to be stored somewhere. Except that that update function will never change. In fact it’s known at compile time. So if you had compiler support, it could just go away. So with compiler support this idea not only can become a normal part of the language, it would also have less overhead than a library based solution. I’m not the only person to have this idea. Sean Parent talks about the same thing at the end of this talk which is about std::future. (slides here, because you can’t read them in the video) It’s interesting that he comes to the same conclusion from a very different angle. He comes at it from multithreading and from chaining together futures. Where he wants the graph for which tasks in his task system will launch which other tasks or wait for other tasks. Which is really the same thing as the dependency graph from Knockout.
I think the idea is strong. The compiler should know which variables depend on which other variables. And Knockout has shown that in cases where the compiler can’t know, you can actually track all of this at runtime. And when you implement it in C++, it’s surprising how little overhead there is for this. I think this will change how we program. After only briefly working with Knockout I already groan every time that I have to push some state from one object to some other object.
So this will be a core piece of my language. And I can think of at least ten problems that I’m going to run into that I haven’t quite solved yet, but I think it’s worth it. I think it will simplify coding a lot while also making our code less buggy and faster at the same time.
Mostly unrelated to this, here is a different topic: I have a feeling that we are writing code at one level of indirection away from the level where we should be writing it.
I have started working on this language, but I haven’t started to write a compiler yet. Instead I am investigating two related topics: Clojures Transducers and Haskells Lenses. They are strongly related to C++ iterators and ranges in that they are trying to separate the algorithms that we perform on data from the data structures itself. I think they both invented nearly the same thing, but I have to play around more with them to be sure. Instead I’ll talk about something that I noticed while working on this: A lot of code looks like this:
std::transform(vec.begin(), vec.end(), std::back_inserter(ovec), [](int a){ return a + 5; }); ovec.erase(std::remove_if(ovec.begin(), ovec.end(), [](int a){ return a > 10; }), ovec.end());
or
std::async(&foo).then([](std::future<std::string> value) { return bar(value.get()); }).then([&lines](std::future<std::string> value) { std::string s = value.get(); boost::char_separator<char> sep("\n\r"); boost::tokenizer<boost::char_separator<char>> line_separator(s, sep); lines.insert(lines.end(), line_separator.begin(), line_separator.end()); });
Where these are trivialized examples, but you get the idea. The transducers talk made me realize that these are actually the same thing. But what I want to complain about is that the actual behavior here is in that lambda on the right hand side. The actual behavior here is “a + 5” or “a > 10” or calling the function foo() followed by the function bar() followed by comparing against newline characters. But there is all that stuff around what I’m actually doing which is just overhead for dealing with the concrete containers that we are using. I would like the “a + 5” and the “a > 10” to be the main code, and I want the containers to be the detail. Where right now the containers are the main statements, and what this code actually does is a sub statement.
I know a few people where if I made this complaint to them, they would laugh kindly, and tell me in the nicest words how silly I am for making this so complicated. If you just use a for loop, this becomes normal code:
for (size_t i = 0, size = vec.size(); i < size; ++i) { int tmp = vec[i]; tmp += 5; if (tmp > 10) ovec.push_back(tmp); }
This is what I’m looking for. But there are two problems with this: The first problem is that it’s a special case. If all we want to do is iterate over every element in a container, we can write normal code like this. But it’s terribly difficult to write reusable code with for loops. I often see people re-implement std::remove_if() or std::rotate() because you start with a for loop, and it’s not clear how to transition from a for loop to using a reusable algorithm. In fact you almost certainly end up with something more complicated and slower than std::remove_if() or std::rotate() because a for loop is mostly good for iterating over every element. It’s not ideal for writing even slightly more complicated algorithms.
And obviously the for loop doesn’t apply to chaining futures.
So I think we’re writing code at the wrong level. There are so many details here that it’s difficult to read what the code actually does. I don’t quite know yet what the right level looks like. But what I want is that if you write single threaded code in a normal function where you just call X first then Y and maybe do a few if/elses, that should look the same as if you did the same operations on an array, and it should look the same as if you did the same operations on futures. I think Transducers and Lenses promise to solve this problem. I have to spend more time with them. I have a hunch that one of these with some heavy syntactic sugar might be what I’m looking for.
I’ve been mildly interested in reactive programming since 2003 (Cocoa Bindings). These days I wonder, though: if I were writing a UI framework from scratch, would I go with something like that? Or would I go pure IMGUI, without any derived state at all? Instead of tracking dependencies, just redo everything every “frame” (until hitting something that is sufficiently expensive, at which point maybe it falls to pieces?).
There are lots of languages that have some kind of syntax sugar for specific kinds of containers: for-each loops, Python list comprehensions. Haven’t watched those videos, but I guess the Clojure and Haskell ideas are trying to generalize this across many kinds of containers? I’ll admit, I don’t know what that brings to the table over standard “map” and “filter”.
My initial reaction to both of these is that first I want to see the debugger that makes it dead simple to understand, and *then* I’ll pay attention to the language that makes it easy to write. Tooling is so important.
Reactive programming is independent of the UI approach. It seems more applicable to classical state based UI, because, well, there is state in those which you have to reason about which reactive programming can make easier. But even in immediate mode UI your variables have to come from somewhere, and you have to write the results somewhere and make things happen based on what users click on. And that’s where dependency tracking makes your life easier.
Updating Everything: That is one way of dealing with state. Somewhere in my backlog of unfinished blog posts I have one called a Survey of State, in which I realized that I know four ways of dealing with state propagation: Update Loops, Observer Pattern, Reactive Programming and Buildsystems. All of these have their own strength and weaknesses. Update Loops for example work really well for animation, buildsystems work well if there can be partial states and partial updates.
Dependency Tracking fits somewhere in there. When I have used it, I have been impressed with how it simplified code. In games the dominant mode of propagating state has been to update everything. I think dependency tracking might change that a little. It’s just a question if I can make it fast enough. But I would predict that it might actually speed up code, simply because once it’s simple to only update when necessary, you may be surprised how much code doesn’t have to run every frame.
Transducers and Lenses: The quick idea of Transducers is that they abstract the idea of iteration. So if I want to add five to every element in a container, or want to add five to the result of a stream of futures, or add five to the result of a parsing operation, the code doesn’t care. You write your algorithm first, and you plug the kind of iteration that you want to do in as the last step.
Lenses also abstract the idea of iteration, but in a different way. Lenses make it so that if you want to iterate over every string in a container, or over the first character of every string in that same container, or over the length of the strings in that same container, the code looks almost the same. And you can go as deep as you want to. So I could for example iterate over the digits of the lengths of the strings in the container.
And as for what I was talking about in the post above: I want a system where syntactic sugar for operating on lists or similar is not necessary. I want my code to look like normal code where I’m just doing operations on variables or calling functions, and the code shouldn’t look any different if I’m doing that on stack local variables, or to every element of a container or to futures.
It seems to me that dependency tracking is far more useful with retained-mode UI than with anything else (which is why it’s the #1 example when it’s brought up). I generally think that having lots of stored state which is dependent on some other state is not ideal (even if the update happens automatically—you’ve just traded a perf hit on read for a perf hit on write).
It’s interesting working with DCC tools where the dominant form of interaction with anything is a dependency graph, though. It’s really powerful (in specific situations) if you stick with it (in absolutely all situations).
“Transducer” still sounds like a fancy word for a fairly standard functional programming concept. I have a function that adds five, which in Haskell is just (+ 5), and I can use any higher-order function I like to do something with it. Lenses sound like compassable lazy sequences in general…
I don’t think we need complier support for the std::function case if the function in truly statically bound. But, we don’t be able to use std function, we’ll need something else.
For the boilerplate around the std algorithms, what I think you want are ranges. Ranges are more than a pair of iterators; check out Eric Neibler’s range-v3 library.
For not using std::function: I’ll have to think about it. There probably is a way. But std::function does make your life a lot easier. It’s the same problem that Sean Parent ran into: The .then() function on futures almost always takes a parameter that is known at compile time and will never change. It’s a total waste to build up the linked list used internally by std::future with all its heap allocations, only to run through it once and to then tear it down again. But good luck trying to replace that with compile time constants.
And ranges are one level of indirection below transducers. I’m not yet sure if that matters. I don’t have a good picture for what comes out of that level of indirection.
But fundamentally none of these help with the boilerplate. They make it less, but I’m bothered by it being there at all.