N2796: Consistent, Warningless, and Intuitive Initialization with {}

1. Changelog

1.1. Revision 1 - August 14th, 2021

Clarify union and aggregate initialization, using motivation from Clang an -ftrivial-auto-init-var=pattern (thanks, Hubert Tong!).
Edits and fixes to the wording (thanks, Robert Seacord!).
Focus on using static storage duration initialization rules, except for unions.

1.2. Revision 0 - May 15th, 2021

Initial release! 🎉

2. Introduction & Motivation

The use of "= { 0 }" to initialize structures, unions, and arrays is a long-standing pillar of C. But, for a long time it has caused some confusion amongst developers. Whether it was initializing arrays of integers and using "= { 1 }" and thinking it would initialize all integers with the value 1 (and being wrong), or getting warnings on some implementations for complicated structures and designated initializers that did not initialize every element, the usage of "{ 0 }" has caused quite a bit of confusion.

Furthermore, this has created great confusion about how initializers are supposed to work. Is the 0 the special element that initializes everything to be 0? Or is it the braces with the 0? What about nested structures? How come "struct my_struct_with_nested_struct ms = { 0 };" is okay, but "struct my_struct_with_nested_struct ms2 = { 0, 0 };" start producing warnings about not initializing elements correctly? This confusion leads to people having very poor ideas about how exactly they need to zero-initialize a structure and results in folks either turning off sometimes helpful warnings^[1] or other issues. It also leads people to do things like fallback to using memset(&ms, 0, sizeof(ms)) or similar patterns rather than just guaranteeing a clear initialization pattern for all structures.

This is also a longstanding compatibility risk with C++, where shared header code that relies on "= {}", thinking it is viable C code, find out that its not allowed. This is the case with GCC, where developers as prominent as the Chief Security Maintainer for Alpine, the largest musl-based distro, recently as April 6th, 2021 say things like:

today i learned. gcc allows this, i’ve used it for years!

Indeed, the use is so ubiquitous that most compilers allow it as an extension and do so quietly until warning level and pedantic checkers are turned on for most compilers and static analyzers! Thankfully for this proposal, every compiler deploying this extension applies the same initialization behavior; perform (almost) identical behavior of static storage duration initialization for every active sub-object/element of the scalar/struct/union (exceptions detailed further below).

3. Design

As hinted at in the last paragraph of the motivation, there is no special design to be engaging in here. Accepting = {} is not only a part of C++, but is existing extension practice in almost every single C compiler that has a shared C/C++ mode, and many other solely-C compilers as an extension (due to its prolific use in many projects). Providing {} as an initializer has the unique benefit of being unambiguous. For example, consider the following nested structures:

struct core {
	int a;
	double b;
};

struct inner {
	struct core c;
};

struct outer {
	struct inner d;
	int e;
};

With this proposal, this code...

int main () {
	struct outer o0 = { 0 };
	struct outer o1 = { 0, 1 }; // warnings about brace elision confusion, but compiles
	// ^ "did I 0-initialize inner, and then give "e" the 1 value?"
	return 0;
}

can instead be written like this code:

int main () {
	struct outer o0 = { }; // completely empty
	struct outer o1 = { { }, 1 };
	// ^ much less ambiguous about what "1" is meant to fill in here
	// without "do I need the '0'?" ambiguity
	return 0;
}

3.1. Consistent "static storage duration initialization"

Almost every single compiler which was surveyed, that implements this extension, agrees that "= { }" should be the same as "= { 0 }", just without the confusing 0 value within the braces (with one notable exception, below). It performs what the C standard calls _static initialization_ / _static storage duration initialization_. Therefore, the wording (and, with minor parsing updates, implementation) burden is minimal since we are not introducing a new class of initialization to the language, just extending an already-in-use syntax.

We note that there are cases where this may differ. These are listed in the sub-sections below, though we note that these departures from what = { 0 } does are mostly beneficial and ways to guarantee even greater stability than the C Standard currently offers us.

3.1.1. Decimal Floating Point

Decimal Floating Point (DFP) do not use the exact same semantics between { 0 } and { }. In particular, { } is a "more strict" version of initialization that writes all bits to 0. In contrast, { 0 } produces a "fuzzy" zero value that includes setting the nominal value to 0 along with a quantum exponent of 0 (which may not be represented perfectly by all bits 0).

This is taken care of with additional wording that highlights the proper behavior for scalars types (which DFP types are considered) for { }, which makes it clear it is initialized properly to a 0 value.

3.1.2. Compiler Extensions + Union Aliasing

Some compilers such as Clang have special compilation modes where they can write bits not equivalent to the "static storage duration initialization" of a type when, such as -ftrivial-auto-var-init=pattern. This creates a difference between what { 0 } and what { } do in those modes. For example, consider the following code:

struct A {
  union {
    char x;
    char y[1024];
  } u;
};

void foo();
int main(void) {
  struct A a = { 0 };
  if (a.u.y[1023]) {
    foo();
  }
}

Without compiler options which change unspecified / indeterminate initialization pattern, Clang will trivially-initialize the union of y with 0, because the values of y in a.u are unspecified. With -ftrivial-auto-var-init=pattern or other non-zero initializer options, these unspecified values become non-zero values and result in foo() being called. Nominally, reading values from a union is unspecified behavior, so on one hand we can simply handwave this away as "who cares?". Indeed, the Standard cannot specify unspecified behavior, even if it is technically legal to read values from y despite never being written to (it is not explicitly undefined behavior or a constraint violation, just unspecified).

On the other hand, we have a noticeable difference here:

struct A {
  union {
    char x;
    char y[1024];
  } u;
};

void foo();
int main(void) {
  struct A a = { };
  if (a.u.y[1023]) {
    foo();
  }
}

Using this initialization syntax, even with different ftrivial-auto-var-init={whatever} flags, the behavior is stable: the entire union is zero-written. In this case, foo() is never called. This proposal replicates this behavior as it has better reliability and security semantics. Note that a user can always fall back to using = { 0 } if leaving other non-overlapping values in a union is undesirable. This does mean that one can, technically, tell if something was initialized with either = { 0 } or = { }, which somewhat contradicts the premise of the paper (that = { 0 } and = { } are identical). But, we find this to be a worthwhile departure from = { 0 }'s semantics.

4. Wording

The following wording is relative to [N2596].

4.1. Modify §6.7.9 paragraph 1’s grammar

initializer:
{ }
{ _initializer-list_ }

4.2. Modify §6.7.9 paragraph 1 to include a new sentence

An empty brace pair ({ }) is called an empty initializer and is referred to as empty initialization.

4.3. Add to to §6.7.9 paragraph 3

The type of the entity to be initialized shall be an array of unknown size or a complete object type that is not a variable length array type. An array of unknown size shall not be initialized by an empty initializer.

4.4. Modify §6.7.9 paragraph 11

~~The initializer for a scalar shall be a single expression, optionally enclosed in braces. The~~ The initializer for a scalar shall be a single expression, optionally enclosed in braces, or it shall be an empty initializer. If the initializer is the empty initializer, the initial value is the same as the initialization of a static storage duration object. Otherwise, the initial value of the object is that of the expression (after conversion); …

4.5. Modify §6.7.9 paragraph 22

If an array of unknown size is initialized, its size is determined by the largest indexed element with an explicit initializer. The array type is completed at the end of its initializer list.

4.6. Add a new paragraph after §6.7.9 paragraph 13

If the initializer is the empty initializer, then it is initialized as follows:

— if it is an aggregate type, every member is initialized (recursively) according to the rules for empty initializers, and any padding bits are initialized to zero;
— if it is a union type, then the member with the largest size is initialized (recursively) according to the rules for empty initialization, and any padding bits are initialized to zero;

4.7. Additional Change (Optional) - lift some VLAs restriction

4.7.1. Modify §6.7.9 paragraph 3

The type of the entity to be initialized shall be an array of unknown size or a complete object type ~~that is not a variable length array type~~ . An entity of variable length array type shall not be initialized except by an empty initializer.

5. Acknowledgements

Thank you to the C community for the push to write this paper! Thank you to Joseph Myers, Hubert Tong, and Martin Uecker for wording improvements and suggestions.

N2796
Consistent, Warningless, and Intuitive Initialization with `{}`

Published Proposal, 2021-08-14

Abstract

1. Changelog

1.1. Revision 1 - August 14th, 2021

1.2. Revision 0 - May 15th, 2021

2. Introduction & Motivation

3. Design

3.1. Consistent "static storage duration initialization"

3.1.1. Decimal Floating Point

3.1.2. Compiler Extensions + Union Aliasing

4. Wording

4.1. Modify §6.7.9 paragraph 1’s grammar

4.2. Modify §6.7.9 paragraph 1 to include a new sentence

4.3. Add to to §6.7.9 paragraph 3

4.4. Modify §6.7.9 paragraph 11

4.5. Modify §6.7.9 paragraph 22

4.6. Add a new paragraph after §6.7.9 paragraph 13

4.7. Additional Change (Optional) - lift some VLAs restriction

4.7.1. Modify §6.7.9 paragraph 3

5. Acknowledgements

References

Informative References

N2796Consistent, Warningless, and Intuitive Initialization with {}

Published Proposal, 2021-08-14

Abstract

1. Changelog

1.1. Revision 1 - August 14th, 2021

1.2. Revision 0 - May 15th, 2021

2. Introduction & Motivation

3. Design

3.1. Consistent "static storage duration initialization"

3.1.1. Decimal Floating Point

3.1.2. Compiler Extensions + Union Aliasing

4. Wording

4.1. Modify §6.7.9 paragraph 1’s grammar

4.2. Modify §6.7.9 paragraph 1 to include a new sentence

4.3. Add to to §6.7.9 paragraph 3

4.4. Modify §6.7.9 paragraph 11

4.5. Modify §6.7.9 paragraph 22

4.6. Add a new paragraph after §6.7.9 paragraph 13

4.7. Additional Change (Optional) - lift some VLAs restriction

4.7.1. Modify §6.7.9 paragraph 3

5. Acknowledgements

References

Informative References

N2796
Consistent, Warningless, and Intuitive Initialization with `{}`