1. Changelog
1.1. Revision 1 - August 14th, 2021
-
Clarify union and aggregate initialization, using motivation from Clang an
(thanks, Hubert Tong!).- ftrivial - auto - init - var = pattern -
Edits and fixes to the wording (thanks, Robert Seacord!).
-
Focus on using static storage duration initialization rules, except for
s.union
1.2. Revision 0 - May 15th, 2021
-
Initial release! 🎉
2. Introduction & Motivation
The use of "
" to initialize structures, unions, and arrays is a long-standing pillar of C. But, for a long time it has caused some confusion amongst developers. Whether it was initializing arrays of integers and using "
" and thinking it would initialize all integers with the value
(and being wrong), or getting warnings on some implementations for complicated structures and designated initializers that did not initialize every element, the usage of "
" has caused quite a bit of confusion.
Furthermore, this has created great confusion about how initializers are supposed to work. Is the
the special element that initializes everything to be
? Or is it the braces with the
? What about nested structures? How come "
" is okay, but "
" start producing warnings about not initializing elements correctly? This confusion leads to people having very poor ideas about how exactly they need to zero-initialize a structure and results in folks either turning off sometimes helpful warnings[1] or other issues. It also leads people to do things like fallback to using
or similar patterns rather than just guaranteeing a clear initialization pattern for all structures.
This is also a longstanding compatibility risk with C++, where shared header code that relies on "
", thinking it is viable C code, find out that its not allowed. This is the case with GCC, where developers as prominent as the Chief Security Maintainer for Alpine, the largest musl-based distro, recently as April 6th, 2021 say things like:
today i learned. gcc allows this, i’ve used it for years!
Indeed, the use is so ubiquitous that most compilers allow it as an extension and do so quietly until warning level and pedantic checkers are turned on for most compilers and static analyzers! Thankfully for this proposal, every compiler deploying this extension applies the same initialization behavior; perform (almost) identical behavior of static storage duration initialization for every active sub-object/element of the scalar/
/
(exceptions detailed further below).
3. Design
As hinted at in the last paragraph of the motivation, there is no special design to be engaging in here. Accepting
is not only a part of C++, but is existing extension practice in almost every single C compiler that has a shared C/C++ mode, and many other solely-C compilers as an extension (due to its prolific use in many projects). Providing
as an initializer has the unique benefit of being unambiguous. For example, consider the following nested structures:
struct core { int a ; double b ; }; struct inner { struct core c ; }; struct outer { struct inner d ; int e ; };
With this proposal, this code...
int main () { struct outer o0 = { 0 }; struct outer o1 = { 0 , 1 }; // warnings about brace elision confusion, but compiles // ^ "did I 0-initialize inner, and then give "e" the 1 value?" return 0 ; }
can instead be written like this code:
int main () { struct outer o0 = { }; // completely empty struct outer o1 = { { }, 1 }; // ^ much less ambiguous about what "1" is meant to fill in here // without "do I need the '0'?" ambiguity return 0 ; }
3.1. Consistent "static storage duration initialization"
Almost every single compiler which was surveyed, that implements this extension, agrees that "
" should be the same as "
", just without the confusing
value within the braces (with one notable exception, below). It performs what the C standard calls _static initialization_ / _static storage duration initialization_. Therefore, the wording (and, with minor parsing updates, implementation) burden is minimal since we are not introducing a new class of initialization to the language, just extending an already-in-use syntax.
We note that there are cases where this may differ. These are listed in the sub-sections below, though we note that these departures from what
does are mostly beneficial and ways to guarantee even greater stability than the C Standard currently offers us.
3.1.1. Decimal Floating Point
Decimal Floating Point (DFP) do not use the exact same semantics between
and
. In particular,
is a "more strict" version of initialization that writes all bits to 0. In contrast,
produces a "fuzzy" zero value that includes setting the nominal value to 0 along with a quantum exponent of 0 (which may not be represented perfectly by all bits 0).
This is taken care of with additional wording that highlights the proper behavior for scalars types (which DFP types are considered) for
, which makes it clear it is initialized properly to a
value.
3.1.2. Compiler Extensions + Union Aliasing
Some compilers such as Clang have special compilation modes where they can write bits not equivalent to the "static storage duration initialization" of a type when, such as
. This creates a difference between what
and what
do in those modes. For example, consider the following code:
struct A { union { char x ; char y [ 1024 ]; } u ; }; void foo (); int main ( void ) { struct A a = { 0 }; if ( a . u . y [ 1023 ]) { foo (); } }
Without compiler options which change unspecified / indeterminate initialization pattern, Clang will trivially-initialize the union of
with
, because the values of
in
are unspecified. With
or other non-zero initializer options, these unspecified values become non-zero values and result in
being called. Nominally, reading values from a union is unspecified behavior, so on one hand we can simply handwave this away as "who cares?". Indeed, the Standard cannot specify unspecified behavior, even if it is technically legal to read values from
despite never being written to (it is not explicitly undefined behavior or a constraint violation, just unspecified).
On the other hand, we have a noticeable difference here:
struct A { union { char x ; char y [ 1024 ]; } u ; }; void foo (); int main ( void ) { struct A a = { }; if ( a . u . y [ 1023 ]) { foo (); } }
Using this initialization syntax, even with different
flags, the behavior is stable: the entire union is zero-written. In this case,
is never called. This proposal replicates this behavior as it has better reliability and security semantics. Note that a user can always fall back to using
if leaving other non-overlapping values in a union is undesirable. This does mean that one can, technically, tell if something was initialized with either
or
, which somewhat contradicts the premise of the paper (that
and
are identical). But, we find this to be a worthwhile departure from
's semantics.
4. Wording
The following wording is relative to [N2596].
4.1. Modify §6.7.9 paragraph 1’s grammar
- initializer:
- { }
- { _initializer-list_ }
4.2. Modify §6.7.9 paragraph 1 to include a new sentence
An empty brace pair ({ }
) is called an empty initializer and is referred to as empty initialization.
4.3. Add to to §6.7.9 paragraph 3
The type of the entity to be initialized shall be an array of unknown size or a complete object type that is not a variable length array type. An array of unknown size shall not be initialized by an empty initializer.
4.4. Modify §6.7.9 paragraph 11
The initializer for a scalar shall be a single expression, optionally enclosed in braces. TheThe initializer for a scalar shall be a single expression, optionally enclosed in braces, or it shall be an empty initializer. If the initializer is the empty initializer, the initial value is the same as the initialization of a static storage duration object. Otherwise, the initial value of the object is that of the expression (after conversion); …
4.5. Modify §6.7.9 paragraph 22
If an array of unknown size is initialized, its size is determined by the largest indexed element with an explicit initializer. The array type is completed at the end of its initializer list.
4.6. Add a new paragraph after §6.7.9 paragraph 13
If the initializer is the empty initializer, then it is initialized as follows:
- — if it is an aggregate type, every member is initialized (recursively) according to the rules for empty initializers, and any padding bits are initialized to zero;
- — if it is a union type, then the member with the largest size is initialized (recursively) according to the rules for empty initialization, and any padding bits are initialized to zero;
4.7. Additional Change (Optional) - lift some VLAs restriction
4.7.1. Modify §6.7.9 paragraph 3
The type of the entity to be initialized shall be an array of unknown size or a complete object type
that is not a variable length array type. An entity of variable length array type shall not be initialized except by an empty initializer.
5. Acknowledgements
Thank you to the C community for the push to write this paper! Thank you to Joseph Myers, Hubert Tong, and Martin Uecker for wording improvements and suggestions.