Lost at C: Anonymous Unions
This is the first installment of a new series, "Lost at C", where I go over obscure language features, historical mishaps, and other interesting parts about the C programming language.
What are unions?
A union is a data structure that can store many different data types, but only one at any given time. Let me explain:
union Number {
int i;
float f;
double d;
};
Here we define a union called Number
. It can store an int
, float
, and
a double
. The size of Number
will be the size of the biggest field in the
union, which will be the double
.
To actually use it is pretty simple:
union Number i = { .i = 1234 };
union Number f = { .f = 3.14f };
union Number d = { .d = 1.414 };
printf("%i\n", i.i);
printf("%f\n", f.f);
printf("%lf\n", d.d);
When compiling, we get the following result:
1234 3.140000 1.414000
Now, what happens if we use a different field then the datatype that we expect?
printf("%i\n", d.i);
printf("%f\n", i.f);
printf("%lf\n", f.d);
Compiling and running:
1992864825 0.000000 0.000000
Was that what you where expecting?
Basically, to use a union correctly, we need a way of identifying what the state of the union is in, so that we can use it correctly:
struct NewNumber {
int state;
union {
int i;
float f;
double d;
} num;
};
Now instead of using Number
, we can use NewNumber
, which will allow
for us to set the state
field. Now we can set/get the state
field,
and based on the results, properly handle the data in the num
union:
struct NewNumber age = {
.state = 0,
.num = { .i = 1234 }
};
// call like so:
if (age.state == 0) printf("%i\n", age.num.i);
if (age.state == 1) printf("%f\n", age.num.f);
if (age.state == 2) printf("%lf\n", age.num.d);
Here we are using a simple int
to describe the data being stored in the
union. In practice, we probably want to use an enum to better signify
what state is which.
Note that the declaration of the union is slightly different as well. There are many different ways of declaring a union:
// version 1
union X { };
// version 2
union X { } Y;
// version 3
union { } Y;
// version 4
union { };
- This is what we used originally. This will create a union with a type of
X
, and can be declared viaunion X x = ...
- Same as 1, except we also create an instance of the union, and call it
Y
- This union is used primarily in structs, since we don't need the type, just access to the instance
Y
. - This is the anonymous union, which we will explain below!
The anonymous union
One of the annoying things about our NewNumber
struct is that we have to
access the number values via this .num
field, which feels clumsy. If only
there was a way to have the fields inside of the union just become part of
the struct. Well, that is were anonymous unions come in!
struct FinalNumber {
int state;
union {
int i;
float f;
double d;
};
};
Now we can use the number like so:
struct FinalNumber fn = {
.state = 0,
.i = 1234
};
printf("%i\n", fn.i);
What is this good for?
This is really good for cutting down on the size of structs when they have many
states, but can only be in one at a time. For example, a Node
struct, which
has a union to pointers containing metadata about different Node
types:
struct Node {
enum NodeType type;
union {
struct VariableNode *var;
struct DefinitionNode *def;
struct ExpressionNode *expr;
};
};
The cons
One of the big drawbacks of using a union is that you can get into weird issues if you don't properly check the state before using the data in the union. For example:
struct Person {
union {
const char *name;
char *dynamic_name;
};
unsigned age;
};
The person has a name
, which is a const char *
. It cannot be touched. But
you can also access name
via dynamic_name
, which is not const
. This means
that if you try to access a name using dynamic_name
which was set via name
,
you will (probably) run into some trouble.
Note that this still could be useful for being able to declare people with
names that aren't heap allocated, and not need to use strdup
. Still, you
need a way to know whether it was heap allocated or not.
Final Thoughts
All in all, anonymous unions (and unions in general) are a really great language feature when used properly, but can cause some major headaches if used incorrectly.