N2900: Consistent, Warningless, and Intuitive Initialization with {}

1. Changelog

1.1. Revision 2 - January 1^st, 2022

Use first-member-initialization behavior consistent with the August/September 2021 Virtual Meeting guidance from WG21.
- Opinion Poll: Would WG14 like to adopt something along the lines of N2727 into C23? 19-1-2 (Yes-No-Abstain). So, clear direction.
- Feedback: "I voted no although I’m in favor in general. I want that union case to be the first member. I also want more clarity."
Removed restriction on VLAs is not optional as Committee gave an opinion poll to put it in:
- Opinion Poll: Would WG14 like something along the lines of lifting the restrictions from VLA’s to be initialize-able by an empty initializer as specified in N2727? 15-0-6 (Yes-No-Abstain). So clear direction.
- No additional vote feedback.

1.2. Revision 1 - August 14th, 2021

Clarify union and aggregate initialization, using motivation from Clang an -ftrivial-auto-init-var=pattern (thanks, Hubert Tong!).
Edits and fixes to the wording (thanks, Robert Seacord!).
Focus on using static storage duration initialization rules, except for unions.

1.3. Revision 0 - May 15th, 2021

Initial release! 🎉

2. Introduction & Motivation

The use of "= { 0 }" to initialize structures, unions, and arrays is a long-standing pillar of C. But, for a long time it has caused some confusion amongst developers. Whether it was initializing arrays of integers and using "= { 1 }" and thinking it would initialize all integers with the value 1 (and being wrong), or getting warnings on some implementations for complicated structures and designated initializers that did not initialize every element, the usage of "{ 0 }" has caused quite a bit of confusion.

Furthermore, this has created great confusion about how initializers are supposed to work. Is the 0 the special element that initializes everything to be 0? Or is it the braces with the 0? What about nested structures? How come "struct my_struct_with_nested_struct ms = { 0 };" is okay, but "struct my_struct_with_nested_struct ms2 = { 0, 0 };" start producing warnings about not initializing elements correctly? This confusion leads to people having very poor ideas about how exactly they need to zero-initialize a structure and results in folks either turning off sometimes helpful warnings^[1] or other issues. It also leads people to do things like fallback to using memset(&ms, 0, sizeof(ms)) or similar patterns rather than just guaranteeing a clear initialization pattern for all structures.

This is also a longstanding compatibility risk with C++, where shared header code that relies on "= {}", thinking it is viable C code, find out that its not allowed. This is the case with GCC, where developers as prominent as the Chief Security Maintainer for Alpine, the largest musl-based distro, recently as April 6th, 2021 say things like:

today i learned. gcc allows this, i’ve used it for years!

Indeed, the use is so ubiquitous that most compilers allow it as an extension and do so quietly until warning level and pedantic checkers are turned on for most compilers and static analyzers! Thankfully for this proposal, every compiler deploying this extension applies the same initialization behavior; perform (almost) identical behavior of static storage duration initialization for every active sub-object/element of the scalar/struct/union (exceptions detailed further below).

3. Design

As hinted at in the last paragraph of the motivation, there is no special design to be engaging in here. Accepting = {} is not only a part of C++, but is existing extension practice in almost every single C compiler that has a shared C/C++ mode, and many other solely-C compilers as an extension (due to its prolific use in many projects). Providing {} as an initializer has the unique benefit of being unambiguous. For example, consider the following nested structures:

struct core {
	int a;
	double b;
};

struct inner {
	struct core c;
};

struct outer {
	struct inner d;
	int e;
};

With this proposal, this code...

int main () {
	struct outer o0 = { 0 };
	struct outer o1 = { 0, 1 }; // warnings about brace elision confusion, but compiles
	// ^ "did I 0-initialize inner, and then give "e" the 1 value?"
	return 0;
}

can instead be written like this code:

int main () {
	struct outer o0 = { }; // completely empty
	struct outer o1 = { { }, 1 };
	// ^ much less ambiguous about what "1" is meant to fill in here
	// without "do I need the '0'?" ambiguity
	return 0;
}

3.1. Consistent "static storage duration initialization"

Almost every single compiler which was surveyed, that implements this extension, agrees that "= { }" should be the same as "= { 0 }", just without the confusing 0 value within the braces (with one notable exception, below). It performs what the C standard calls static initialization / static storage duration initialization. Therefore, the wording (and, with minor parsing updates, implementation) burden is minimal since we are not introducing a new class of initialization to the language, just extending an already-in-use syntax.

We note that there are cases where this may differ. These are listed in the sub-sections below, though we note that these departures from what = { 0 } does are mostly beneficial and ways to guarantee even greater stability than the C Standard currently offers us.

3.2. Union Initialization

In earlier versions of this proposal, unions had an exception placed into its {} initialization that stated it would do static initialization for its largest member. This drew a lot of concern from WG14 during the August/September 2021 Virtual Meeting. This leaves two potential options to match Clang behavior (written about in the Appendix):

specify that the largest member undergoes zero-initialization, and then the first member is statically initialized;
or, specify that the first member is statically initialized and the rest of the values are left in an unspecified manner, as is consistent with = { 0 } initialization.

Clang’s extension behavior works with both of these syntaxes, since "unspecified" is a strict superset of "zero-initialize the largest member". Padding bits (bits that exist outside the representation of any members in the union) are still zero-initialized as normal. We provide wording for both alternatives and leave it up to the Committee to choose a given behavior.

4. Wording

The following wording is relative to [N2731].

4.1. Modify §6.7.9 paragraph 1’s grammar

initializer:
{ }
{ _initializer-list_ }

4.2. Modify §6.7.9 paragraph 1 to include a new sentence

An empty brace pair ({ }) is called an empty initializer and is referred to as empty initialization.

4.3. Add to to §6.7.9 paragraph 3

The type of the entity to be initialized shall be an array of unknown size or a complete object type that is not a variable length array type. An array of unknown size shall not be initialized by an empty initializer.

4.4. Modify §6.7.9 paragraph 10

If an object that has automatic storage duration is initialized with an empty initializer, its value is the same as the initialization of a static storage duration object. Otherwise, if If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized explicitly , or is initialized with an empty initializer, , then: …

4.5. OPTIONAL CHANGE 0: Largest-Then-First Initialization - Modify §6.7.9 paragraph 10, last bullet point

…

— if it is a union and the initializer is the empty initializer, the largest member is initialized (recursively) according to these rules, then the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
— if it is a union and the initializer is not an empty initializer , the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
…

4.6. Modify §6.7.9 paragraph 11

~~The initializer for a scalar shall be a single expression, optionally enclosed in braces. The~~ The initializer for a scalar shall be a single expression, optionally enclosed in braces, or it shall be an empty initializer. If the initializer is the empty initializer, the initial value is the same as the initialization of a static storage duration object. Otherwise, the initial value of the object is that of the expression (after conversion); …

4.7. Lift Empty-Initializer for VLAs Restriction

4.7.1. Modify §6.7.9 paragraph 3

The type of the entity to be initialized shall be an array of unknown size or a complete object type ~~that is not a variable length array type~~ . An entity of variable length array type shall not be initialized except by an empty initializer.

5. Acknowledgements

Thank you to the C community for the push to write this paper! Thank you to Joseph Myers, Hubert Tong, and Martin Uecker for wording improvements and suggestions.

6. Appendix

The appendix is a collection of historical references to old paper material or points that are potentially relevant but ultimately not required for understanding the full motivation and wording of the proposal.

6.1. Decimal Floating Point

Originally, this section in a previous revision of the paper was concerned about Decimal Floating Point initialization. But, this was clarified as a bug / potentially wrong interaction and a paper was brought forward to fix it, which has already been voted into the C Standard. The old text is reproduced for historical reasons, just below:

Decimal Floating Point (DFP) do not use the exact same semantics between { 0 } and { }. In particular, { } is a "more strict" version of initialization that writes all bits to 0. In contrast, { 0 } produces a "fuzzy" zero value that includes setting the nominal value to 0 along with a quantum exponent of 0 (which may not be represented perfectly by all bits 0).

This is taken care of with additional wording that highlights the proper behavior for scalars types (which DFP types are considered) for { }, which makes it clear it is initialized properly to a 0 value.

6.2. Compiler Extensions + Union Aliasing

Some compilers such as Clang have special compilation modes where they can write bits not equivalent to the "static storage duration initialization" of a type when, such as -ftrivial-auto-var-init=pattern. This creates a difference between what { 0 } and what { } do in those modes. For example, consider the following code:

struct A {
  union {
    char x;
    char y[1024];
  } u;
};

void foo();
int main(void) {
  struct A a = { 0 };
  if (a.u.y[1023]) {
    foo();
  }
}

Without compiler options which change unspecified / indeterminate initialization pattern, Clang will trivially-initialize the union of y with 0, because the values of y in a.u are unspecified. With -ftrivial-auto-var-init=pattern or other non-zero initializer options, these unspecified values become non-zero values and result in foo() being called. Nominally, reading values from a union is unspecified behavior, so on one hand we can simply handwave this away as "who cares?". Indeed, the Standard cannot specify unspecified behavior, even if it is technically legal to read values from y despite never being written to (it is not explicitly undefined behavior or a constraint violation, just unspecified).

On the other hand, we have a noticeable difference here:

struct A {
  union {
    char x;
    char y[1024];
  } u;
};

void foo();
int main(void) {
  struct A a = { };
  if (a.u.y[1023]) {
    foo();
  }
}

Using this initialization syntax, even with different ftrivial-auto-var-init={whatever} flags, the behavior is stable: the entire union is zero-written. In this case, foo() is never called. This proposal’s previous iterations replicated this behavior, as it was thought to provide better reliability and security semantics. Note that a user can always fall back to using = { 0 } if leaving other non-overlapping values in a union is undesirable. This does mean that one can, technically, tell if something was initialized with either = { 0 } or = { }, which somewhat contradicts the premise of the paper (that = { 0 } and = { } are identical).

Ultimately, the Committee was not in favor of the largest-member-initialized behavior for an empty initializer { }.

N2900
Consistent, Warningless, and Intuitive Initialization with `{}`

Published Proposal, 2022-01-01

Abstract

1. Changelog

1.1. Revision 2 - January 1^st, 2022

1.2. Revision 1 - August 14th, 2021

1.3. Revision 0 - May 15th, 2021

2. Introduction & Motivation

3. Design

3.1. Consistent "static storage duration initialization"

3.2. Union Initialization

4. Wording

4.1. Modify §6.7.9 paragraph 1’s grammar

4.2. Modify §6.7.9 paragraph 1 to include a new sentence

4.3. Add to to §6.7.9 paragraph 3

4.4. Modify §6.7.9 paragraph 10

4.5. OPTIONAL CHANGE 0: Largest-Then-First Initialization - Modify §6.7.9 paragraph 10, last bullet point

4.6. Modify §6.7.9 paragraph 11

4.7. Lift Empty-Initializer for VLAs Restriction

4.7.1. Modify §6.7.9 paragraph 3

5. Acknowledgements

6. Appendix

6.1. Decimal Floating Point

6.2. Compiler Extensions + Union Aliasing

References

Informative References

N2900Consistent, Warningless, and Intuitive Initialization with {}

Published Proposal, 2022-01-01

Abstract

1. Changelog

1.1. Revision 2 - January 1st, 2022

1.2. Revision 1 - August 14th, 2021

1.3. Revision 0 - May 15th, 2021

2. Introduction & Motivation

3. Design

3.1. Consistent "static storage duration initialization"

3.2. Union Initialization

4. Wording

4.1. Modify §6.7.9 paragraph 1’s grammar

4.2. Modify §6.7.9 paragraph 1 to include a new sentence

4.3. Add to to §6.7.9 paragraph 3

4.4. Modify §6.7.9 paragraph 10

4.5. OPTIONAL CHANGE 0: Largest-Then-First Initialization - Modify §6.7.9 paragraph 10, last bullet point

4.6. Modify §6.7.9 paragraph 11

4.7. Lift Empty-Initializer for VLAs Restriction

4.7.1. Modify §6.7.9 paragraph 3

5. Acknowledgements

6. Appendix

6.1. Decimal Floating Point

6.2. Compiler Extensions + Union Aliasing

References

Informative References

N2900
Consistent, Warningless, and Intuitive Initialization with `{}`

1.1. Revision 2 - January 1^st, 2022