C Enums
I would like to continue the trend of writing up short articles on how I use C. This time the subject is enums.
I feel compelled to write out things I've learned about C over the years, and to explain how I currently use it. This has been hard-won knowledge, and a lot of unlearning of things I was taught in school. I try to make tradeoffs towards robustness, restriction, and discipline over other concerns, while not being so restrictive that programs are too time consuming to deliver. Its a fine line and a hard road. Its an always changing, hopefully improving process, but an imperfect one.
This is a report more then a tutorial- I'm not saying that things should be done this way, only that this is a way to do them. The best case is that there are a few things to learn from this post that can be incorporated into your style.
Speaking of style, I will use my organization's current style, but this is orthogonal to the concerns at hand in almost all cases.
On to Enums
I like enums, and I use them more them my colleagues. This probably stems from my Haskell days, but I feel that they are a vital means of modeling intent.
Unlike in many language, in C and enum is just an integer. It can have any value in a large range, and you cannot trust that it has one of the valid values. This makes C enums tricker then other languages, which is on par with C in general.
For an early example, lets say:
typedef enum MC_STATE_ENUM { MC_STATE_IDLE = 1, MC_STATE_OPERATIONAL = 2, MC_STATE_SAFE = 3, } MC_STATE_ENUM;
There are several things to unpack here, so we will come back to this definition a couple of times.
I use enums any time there are a known number of options to choose from: return codes from a function, packet IDs, flags, states.
I prefer enums to a series of similarly named #defines for several reasons-
- The enum groups the values together, indicating their relation more clearly then "the #defines are near each other".
- It can provides the enums exhustiveness checking in switch statements (although with a caveat we will discuss later).
- The enum type is much clearer to see in declarations, such as function arguments, then 'int' with some documentation saying that it is supposed to be only certain values.
- Once an enum is checked to be within the valid set of values, it is a "proper" enum and can be used as if it can only take on a certain set of values. I check for invalid values anyway, but using an int provides none of this sense of a restricted type.
With this said, the disadvantage of enums is that they are "closed"- you cannot add variants outside of the definition. For some cases, such as errno values in POSIX systems, you don't know all possible values so a series of #defines makes sense.
Naming
I use a certain naming convention for enums, but I know there are many other that are equally or more valid than mine.
The convention is that the enum name and variants are all caps, the name ends in "_ENUM", but the variants omit this, the variants repeat the rest of the enum's name, the enum starts with a prefix that identifies the containing module, the enum is always typedefed, the enum name is repeated twice, and the variants are given explicit values.
The typedef issue is simply due to the fact that C has different namespaces for structs, enum, and other type. By typedefing the enum, you do not have to refer to it as "enum MC_STATE_ENUM" and can simply say "MC_STATE_ENUM". I could see an argument that it is better to be explicit and state the kind of type you have, but I do not follow this and instead allow the naming conventions to identify different types. This is admittedly ambigious in some siutations (function vs structs), but it is the convention that I currently follow.
Repeating the name is a very minor thing, but technically it allows the enum to be written with the 'enum' qualifier, and it seems to help Doxygen document types (I do this with structs too).
Restrictions
While I like enums, there are some things to avoid when using them.
Explicit Values
I always list out explicit values rather than allow the compiler to assign values.
This is mostly important for data that is serialized or sent over a network, but it also helps if you are debugging or printing value out, and need to know which enum variant you ahve.
I could understand not following this restriction- it is manual and potentially error prone, and many enums simply need to have different values rather than any specific values. However, I've simply gotten in the habit of being explicit with all enums rather than trying to decide when to be explicit and when to leave values implicit.
Memory Layout
Do not use enums in types where the memory layout is significant. Enums are native ints, although this can be change in some compilers using flags and attributes. Either way, you should not trust the size of an enum, and its always preferable to use types from "stdint.h" instead to control the exact size of your fields.
The disadvantage of this is that the enums will be cast into or out of unsigned integer types, which means a lot of checking to make sure the enum contains only defined values first. This is life in C- you end up doing a lot of checking, all the time, in order to create robust code.
Extra Values
The next restriction on enums is to make sure they only contain values defined in their variants. This is an invariant of enums that should be maintained, and violating this invariant creates an implicit assumption that must be carried through any code it touchs. Its better to not add an assumption and just ensure enums only have defined values.
I've mentioned this a couple of times, but I wanted to reiterate it because it can be tempting to allow other values. For example, if there are user defined values outside of your defined range, its tempting to allow these, and determine this by checking for your own values. There is also a potential temptation to use enum variants as bit masks (1, 2, 4, 8, etc) and OR them into each other to create flags. However, I suggest avoiding these situations- in the former just use a (known size) integer type, check if it has your defined values, and cast it to your enum if it does (leaving it an its integer type otherwise). In the second case, its fine to use the variants as masks, but mask them into an unsigned integer, not into something of the enum type.
Number of Variants Trick
There is one trick people sometimes use with enums- if the enum contains values starting at 0 and counting upwards, than an additional variant can be added at the end to indicate the number of options.
C has no reflection capabilities, so there is no way to ask for the values of an enum, or how many variants there are. I believe people use macro tricks for this in some cases, but for robustness and adherence to safety standards, I do not use complex macros.
WIth this said, adding an extra variant becomes a nice way to indicate the number of variants in the enum. Its a bit inconvienent, as there is now a variant that must be checked for but will not occur as a normal option, but it can be worth it.
An example would be:
typedef enum MC_STATE_ENUM { MC_STATE_IDLE = 0, MC_STATE_OPERATIONAL = 1, MC_STATE_SAFE = 2, MC_STATE_NUM_STATES, } MC_STATE_ENUM;
Note that this requires MC_STATE_IDLE to start at 0 instead of 1 for this to work. You can't really add an INVALID case at 0, unless you want the NUM_STATES to count an invalid variant. Its tricky, which is why this is a trick.
Overall, I do not use this trick very much. It only works on some enums, and is only useful in some cases. However, its good to know about, and I expect some people use it much more extensively.
The Value of 0
The use of 0 is an important thing to consider when defining an enum. Whichever variants gets the value 0 will occur often as a 'default' or the result of a memset, intended or not, so it should be designed with this in mind. For example, you could make it an 'invalid' value, indicating that something is wrong because the enum must be filled out with another variant before it is useful.
typedef enum MC_STATE_ENUM { MC_STATE_INVALID = 0, MC_STATE_IDLE = 1, MC_STATE_OPERATIONAL = 2, MC_STATE_SAFE = 3, } MC_STATE_ENUM;
You could also make it the 'okay' value, indicating the absence of errors. This is convienent, and you can initial memory containing such an enum without having to take any special actions. This ease is at odds with correctness however- if you have a 0 valued enum you never know if it was intentially 0 or accidentally 0 and never filled out with a real value. For this reason, I use 1 for my 'okay' value and do not use 0 unless there are other reasons to do so.
I believe that this is at odds with a lot of other people's advice, so you will have to decide yourself. Restricting the use of 0 values correctness (allowing a mistake to be caught) over convenience (assuming no other concerns are in play), and this is my preference for the C that I write.
Checking
You will often deconstruct an enum using a switch- this is a fundmental, literally defining, property of enums (enums as natural numbers in the type system style).
MC_STATE_ENUM state = MC_STATE_IDLE; switch (state) { case MC_STATE_IDLE: { /* idle action */ break; } case MC_STATE_OPERATIONAL: { /* operational action */ break; } case MC_STATE_SAFE: { /* safe action */ break; } default: result = SOME_ERROR_CODE; break; }
Always break at the end of cases, with the possible exception that a case may be empty. This includes the default case.
Try to include a 'default' case, and do something sensible like return an error code. The compile may warn you that it cannot be reached, but it is wrong. I've seen these cases get hit- its the classic 'this is impossible' path that turns out to be possible.
Cases may have '{}', which allows you to define variables within the case. I would like to say that these should always be included, but in practice I find myself occasionally omitting them in small case expressions when I'm trying to keep more of the switch statement on the screen at once. Note that the '{' and ']' should line up with the case in whatever way you usually line up curly braces in other constructs.
I always use switch statements instead of multi-way 'if's unless I need to check other properties, or I'm only checking a small subset of variants for a specific reason.
Compile Flags
I suggest adding -Wswitch-enum to your builds, and if you don't already use -Wall, then also add -Wswitch.
-Wswitch checks that all enum variants are covered, except if there is a default which covers the remaining cases. The addition of -Wswitch-enum ensures that all variants have a branch even if a default case is present. This may mean that you need to add some cases for unused variants that fall through into the default- this seems like a small price to pay to me for being explicit and getting extra checks.
Conclusion
Hopefully there is something interesting in here. C does not have a large number of features, so understanding each one is a major part of becoming effective in the language.