Sonntag, 16. Oktober 2016

Why modern C++ is good for bare metal


For those of you who don't know me I am passionate about efficiency, static checking and generic library design. TMP (Template Meta Programming) is usually my tool of choice and I am active in the that community as well (see my <a href="http://metaporky.blogspot.com/">metaporky</a> blog). In my day job, and also for fun, I write programs which run on tiny chips (these days 6k RAM Cortex-M0 is the floor for new designs, little reason to go below that). In this blog I hope to express my thoughts on where I think this industry should go and the direction I am trying to push it in (see our www.embo.io conference for example).

From bit tricks to C++

When explaining to people why I chose C++ for bare metal I often encounter the claim "everything you can do in C++ you can also do in C, just much faster". Although the first half may be true and if all C programmers were all knowing experts the second half would also be true this line of thinking may be missing the bigger picture. One could also make a second claim: "everything you can do in C you can also do in assembler, just much faster". The second claim is arguably more defendable then the first as there are typically assembler instructions which cannot be expressed in C. To play devils advocate there are also binary bit patterns which cannot be expressed in assembler, in code obfuscation it is common practice to execute a string of code and then execute the 'same' piece of code offset by one byte thus causing the instructions to be interpreted in a completely different manor. This chain of argument seems to be recursive and will probably end in us doting our own silicon.

If we look at the evolution leading up to C++ we can find a recurring pattern. Going from bits to assembler we notice that bits are far from our mental model but if we name different bit patterns we have come closer to our mental model. Naming patterns narrows the set of possible expressible patterns but I can still express all the things I need to so its actually a good thing (I can't address a work register that does not exist for example). Virtually all the things I cannot express in assembler would actually be errors if I were to use them.

Going from assembler to C we again are naming common patterns like for loops or if statements or comparison operators. These patterns are even closer to our mental model and they are, to some extent, hardware agnostic. Not mapping directly to hardware is a mixed blessing, on the one hand we achieve at least some level of portability, beginners don't need to know how 'i==4' is implemented and we have to think of much less stuff. On the other hand, in the beginning at least, the compiler did not know all the bit tricks we do. For the first half of my career people told me things like "if you are multiplying by a power of 2 just shift". I later learned that this compiler optimization was implemented about the year of my birth and that it is actually a bad idea to try and help the compiler because it may break the "canonicalization" step in the optimization process.

With the advent of hardware agnostic code we are expressing ourselves at a much higher level of abstraction allowing for optimization. At the end of the day no assembler programmer in their right mind will inline all the functions which they expect to only use at one spot and then 'deinline' them as they start to be used elsewhere. The optimizer however is happy to do that for us (if we write our code right). Its not just about readability though, as optimizers get better there is the potential for them to surpass even the ability of highly skilled programmers as the optimizer is arguably capable of the cumulative expertise of all experts who ever worked on it. This 'encapsulation of expertise' ability is not to be overlooked.

Going from C to 'C with classes', an early form of what is now C++, we again start naming common patterns. The vast majority of C programs contain some code which consists of groups of similar functions:

struct thing {
 //data
};
void doSomething(thing*, int); 
void doOtherThing(thing*, int, int);
int getSomething(thing*);
etc. C with classes essentially named this pattern a "class" and provided a code generator for it. We also see this pattern:

int f_int(int);
double f_double(double);
this we call function overloading and the code generator does that for us too. So C with classes was essentially just a more powerful macro library, whats not to like?

And then there was virtual

In C we also sometimes see the pattern:
struct dynamic_interface {
 void (*f1)(int);
 void (*f2)(int, int);
 int (*f3)();
};
in order to express type erasure. That is not the whole of the pattern, but it looks ungodly enough as it is. For this pattern the code generator of C with classes, which at that point had become C++, uses the 'virtual' keyword. From my perspective as someone who was a hippie kid at the time and by no means following these developments live, it is the (mis)use of volatile that screwed everything up. The problem is that rather than just using volatile for type erasure it also became a tool for generic programming. Using indirect function calls in order to implement the open-closed principal or policy based class design is one of the things which went fundamentally wrong in C++ and is the root of much of the valid efficiency criticism. Thank god we let Alexander Steppanov do most of the STL or we would have become another Java.

At the end of the day the vast majority of virtual function calls are fixed at compile time, performing this optimization in all cases is incredibly hard for the optimizer and to this day is not done well. Think of a vector of widgets for example, if I know that they are actually all 'ListItem' widgets and that there are 10 of them I could make that vector of pointers to heap allocated objects into a std::array of 10 stack allocated ListItem widgets. The optimizer is a long way from being able to make that optimization.

More recently Sean Parent has thought us how to make type erasure wrappers using his "concept based polymorphism" so we can separate the indirect function calls from the statically linkable classes them selves. This allows the user to decide when the polymorphism should be static and when it should be dynamic. At the same time the auto keyword allows us to factory construct policy based classes without the user needing to type out the ungodly resulting type. We are well on our way to purging the unneeded virtual function calls.

Now I can make my own code generator!

Although its roots are older the real power of TMP arrived in C++11. variadic templates and template aliases have allowed us to speed up compile time by two orders of magnitude while increasing readability at the same time. The advent of modern TMP is in my opinion on much the same scale as the step from C to C with classes, now I can write my own generator! This allows me as a bare metal expert to encapsulate my expertise for others to use without having to make a domain specific compiler.

Stay tuned for some embedded specific patterns in the next posts.