This post is about how to write generic code in C. This is a problem I often have- I write tools in C which are fairly simple, but often would benefit from more advanced techniques as they get larger. I use C because it is the language we use for flight code, it is a simple language that I can expect people to know, and it is easier to control the layout of memory then other languages. I can write a useful GUI program in LabWindows in hours, talk to hardware, and process telemetry from other systems. Its by no means my favorite language, but its often the right tool for the work that I do.

This is a language that does not include a lot of means for abstraction, and we will have to pull some tricks to reduce redundancy and express more complex concepts. These things do not feel native to the language, and I feel like I am usually using a fairly primitive set of building blocks to create my programs. The techniques we will look at are manual and its easier to get wrong in C then other languages, but still worth thinking about.

So- lets look at what we can do to get some polymorphism in C when we are not ready or able to reach for other languages.

Some Context

The C language is fairly low level and explict. Sometimes it is not low level enough, and often it is much too low level. In flight code (any code that runs on the embedded systems for space or aero applications) we keep a small subset of C and stray very little into more advanced techniques in order to keep the code as understandable as possible. For me then, these techniques are things I would use when no one is looking too closely at my code, or I'm writing something for myself.

The C Language

The main constructions in C are procedures, structs, unions, enums, global values, and proprocessor directives. I'm grouping the preprocessor into C as they are part of C programs in practice, even if they are a separate stage of compilation before the C language itself. C's type system has primitive types (int, char, etc), the array type constructor, user types in the form of structs and unions, type alias in typedefs, and pointer types.

Of these, pointer types give the most means for abstraction- they let you talk about data in an uniform way (through an address) that does not depend on the structure of the data, as the pointer is always the same form regardless of what it points to. In addition, function pointers get you the bare minimum to treat computation as a subject that one can control in C. There is no way to create functions at run time such as through composition, but at least we have first order functions.

The Cast

Pre-Processor Magic

One way to get a form of polymorphism in your code is to essentially expand it automatically to monomorphic code- in other words, create an entirely separate copy for each type you are interested in manipulating. This can be done with the preprocessor, allowing entire data structures and their interfaces to be generated per type. This reminds me in logic of expanding your inference rules so that there is technically a separate rule for each proposition, rather then using quantification and saying that the rules apply for all propositions.

This has the advantage of type safety, since the code is generated for a particular type, and it may have some performance advantages when the implementation can make use of properties of the type like its size to generate specific code like memory layouts.

One disadvantage is complexity- both in the implementation which must be written mostly within the pre-processor, and in the user code which must generate a great deal of code that can't be read directly. Even reading the generated code is not as good as having simple code to read in the first place. I imagine this code is hard to write and test, although I've never done it myself.

An example of this approach is in sglib where the author create type safe data structures and some higher order functionality like sorting with user-defined functions, in C.

One other note here is that I've seen this technique done manually as well- duplicate data structures and functions with almost no difference all through codebases. This is bad programming practice and is wasteful in time for programming, testing, and reviewing, but in some contexts its hard to avoid.

void *

Another possibility is to simply drop into the world of untyped data, making everything a pointer to void. This means that you take responsibility for the types of your data. You can do this by always using void pointers to a particular type with a section of code, such as in a data structure like a tree where all nodes point to data of the same type, or you can do a manual kind of sum type and make an enum with all the types you want to use, tagging your pointers with a value of this enum to distinguish what it points to. This amounts to carving out a universe of types from the C type system, and when I've done this usually I only allow basic types like uint8, int16, uint32, etc, and then a generic buffer or C string type.

The advantage of this techinque is that your code works on many data types- you can store information like the size of the data along with it, and allocate, deallocate, move, and manipulate your data without knowing what it is. This is almost a parametric polymorphism in the sense that if you truely do not know what your data contains, so you can only perform operations on it that work on all data. Nothing stops you from doing otherwise, but we are in C and we have to accept this responsibility.

This is the technique I go for most often. I don't enjoy it, but it comes in useful too often to ignore.

I know of no way to enforce constraints in this case, unless you consider the next technique a kind of constraint.

One example of this style can be found here. This is also the style used in LabWindows for its generic data structures in the programmer's toolbox.

Row Polymorphism

This is perhaps the most interesting technique, at least to me. In this case you define a series of structures, some of which contain others as their first field. Doing this means that you can upcast a struct into one of the ones it contains, losing information about what it contains. This gives it a similar feel to subtyping (although this does not hold up formally), where you can go own the tree of subtypes and get more information, and up the tree to lose information when you want to express something more generically.

I've wanted to explore this in more detail, as it opens up a lot of possibiilty in the restricted world of C. You do have to be careful with memory layout- you can no longer assume you know the size of a structure based on its type. You are also restricted in the sense that you can extend in only one way per struct- you can extend with new fields, but the order of extensions matters without some kind of dictionary or table lookup.

One example of this technique is the Linux kernel. Linux apparently uses this technique to embed structs in other struct, along with macros for getting out to the containing structure.

Another example might be Cello, which is a very different implementation then the one used in Linux, and does use a lookup to dispatch functions. I hope to write a bit more about this in the future, but its definitely worth checking out, even if only for fun.

_Generic

In C11 there was a feature added to the C language called _Generic. This is not a general technique for generic programming- all it allows you to do is to select a function based on a type. You can then write macros that select the right function but give a single interface which is nice, but to my knowledge there is no way to use this more generally- you have to know beforehand which types you are concerned with. Its still a fine feature, its just got a more limited application then the techniques above.

Conclusion

I tend to stick to void* style generic programming in C because it is the most straightforward in my opinion and the easiest to review for. I'm more interested in sticking to conventions and keeping code simple then powerful and generic. I am even willing to trade the type safety of macros for code that uses the simplest subset of C possible, but that is mostly my training and my application domain talking. I treat C more carefully than other languages because my C code must be higher quality then my other code.

Also,for reference, here is another treatment of these same topics.