1. Changelog
1.1. Revision 4 - April 12th, 2022
-
Switch from
to_Bool
after the latest additions to the C Standard.bool -
Vastly improve the wording after feedback, make sure it does not conflict with the Improved Normal Enumerations paper.
-
Clarify the use of the processing of the integers for the ones of underlying type.
-
Directly specify the use of integer constant expressions and their interaction with enumerations in § 4.1 Unsigned, Wrap Around, and Overflow Semantics.
-
Explain rationale for blocking parsing issues in § 4.3 Variables, Declarations, and Parsing (Oh my!).
-
Be clear about the type of the enumeration constants in § 4.4 Type of Enumeration Constants.
1.2. Revision 3 - January 1st, 2022
-
Change of paper primary author to JeanHeyd and Shepherd: thank you, Clive Pygott, for your studious shepherding of this issue for over 4 years!
-
Address feedback and comments from March/April 2021 Virtual Meeting.
-
Address direct feedback from Joseph Myers and Robert Seacord (thank you for the effort!).
-
Allow
as an underlying type. (This matches C++ and C extensions.)_Bool
1.3. Revision 2 - October 4th, 2020
-
Prepare for changes to C23, address some minor feedback comments from the August 2020 Virtual Meeting.
-
Support for forward declarations of both fixed underlying type enumerations and enumerations without fixed underlying type.
-
Clarify that
should probably not be supported as an underlying type._Bool
1.4. Revision 1 - June 28th, 2020
-
Address main comment from 2016 meeting: clumsy concrete syntax for enum-type-specifier was overly restrictive (e.g., wouldn’t allow the use of a typedef). Use
term more clearly.type - specifier -
Change syntax to allow for attributes.
1.5. Revision 0 - February 17th, 2016
-
Initial release 🎉!
2. Introduction and Motivation
C normally tries to pick
for its enumerations, but it’s entirely unspecified what the type for the
will end up being. It’s constants (and the initializers for those constants) are always treated as
s, which is not very helpful for individuals who want to use things like enumerations in their bitfields with specific kinds of properties. This means it’s impossible to portably define an enumeration, which drastically decreases its usefulness and makes it harder to rely on enumeration values (and consequently, their type) in standard C code. This has led to a number of communities and tools attempting to do enumerations differently in several languages, or in the case of C++ simply enhancing enumerations with specific features to make them both portable and dependable.
This proposal provides an underlying enumeration type, specified after a colon of the _identifier_ for the enumeration name, to give the enumeration a dependable type. It makes the types for each of the enumeration constants the same as the specified underlying type, while leaving the current enumerations as unspecified as they were in their old iterations. It does not attempt to solve problems outside the scope of making sure that constants with specified underlying type are dependable, and attempts to make forward declaration of enumerations work across implementations.
3. Prior Art
C++ has this as a feature for their enumerations. Certain C compilers have this as an extension in their C compilation modes specifically, including Clang.
4. Design
The design of this feature follows C++'s syntax for both compatibility reasons and because the design is genuinely simple and useful:
enum a : unsigned long long { a0 = 0xFFFFFFFFFFFFFFFFULL // ^ not a constraint violation with a 64-bit unsigned long long };
Furthermore, the type of
is specified to be
, such that this program:
enum a : unsigned long long { a0 = 0xFFFFFFFFFFFFFFFFULL }; int main () { return _Generic ( a0 , unsigned long long : 0 , default : 1 ); }
exits with a return value of
. Note that because this change is entirely opt-in, no previous code is impacted and code that was originally a syntax violation will become well-formed with the same semantics as they had from their C++ counterparts. The interesting component of this proposal - that is currently marked optional - addresses a separate issue found in the current enumeration specification.
4.1. Unsigned, Wrap Around, and Overflow Semantics
Consider the code sample:
enum flags : unsigned int { a = 0x01 , // … o = 0x8000 , p = 0x100000 , // … low_16_merged_flags = 0xFFFF , alternative_p // implicit 0xFFFF + 1 }
This code is (intentionally) a footgun. For starters,
and
need not be 32 bits wide: their lowest requirement is 16 bits. This means that the
flag is not within the representable range of an
. There is also the problem of the enumeration constant that comes after the
enumeration constant, the
. This one is, implicitly, the same as
because of the
would yield
. This, too, is outside the range of a 16-bit unsigned integer type in C.
There are 2 ways to resolve this tension.
The first is to allow this code to compile, and perform silent wraparound on
and
. This means that, regardless of the user intent, the specified value (
) and implicit value (
) would both take on a value of
, same as the
flag. If this code was meant to be ported between platforms, this code compiles silently but has the wrong expected behavior when run. Tests, fuzzing, and other mechanisms may catch the problem and remind the user to appropriate a better named underlying type, or check the flag values more carefully.
The second way to solve this is to make the above a constraint violation. That means both
and
, when ported to a platform where
is 16 bits wide, will loudly complain that the value is inappropriate. This would prevent compilation on platforms, rather than require testing, fuzzing, and other techniques to handle the range of values.
This proposal goes with the second way. It is a far better user experience to prevent compilation where possible: silent wraparound is a property of the machine and done for performance and hardware reasons. For interpreted implementations, the translation step still has to take care of the expression because it is considered a constant expression. Enumeration initialization should be robust C code to remain robust and without error over the long term.
Users who would like to avoid such errors will be reminded to select from the wide variety of battle-tested integer types in
, provided for their convenience, when such cases arise in C23 and beyond:
#include <limits.h>enum flags : uint_least32_t { // 👍! a = 0x01 , // … o = 0x8000 , p = 0x100000 , // works fine p = 0x100000u , // works fine // … low_16_merged_flags = 0xFFFF , alternative_p // implicit 0xFFFF + 1, // works fine for 32-bit }
It is better to provide an error that prevents non-portable code from exhibiting non-portable behavior, while portable code compiles, works, and runs across all platforms as expected. Finally, users who want the wraparound behavior can perform a manual cast to get what they want:
enum flags : unsigned int { a = 0x01 , // … o = 0x8000 , p = ( unsigned int ) 0x100000 , // cast: wraparound explicit p = 0x100000u , // literal suffix: explicit (any errors handled by literal) // … low_16_merged_flags = 0xFFFF , alternative_p // implicit 0xFFFF + 1, constraint violation }
This is also consistent with existing practice around the subject (Clang x86-64 trunk).
4.2. Bit-Precise Integer Types and bool
?
Integers such as
are, currently, allowed as an extension for an underlying enumeration type in Clang. However, discussing this with the Clang implementers, there was sentiment that this just "happened to work" and was a not a fully planned part of the
/
integration plan. They proposed that they would implement a diagnostic for it for future versions of Clang. In the standard, we do not want to step on the toes of anyone who may want to develop extensions in this place, especially when it comes to whether or not bit-precise enumeration types undergo integer promotion or follow the same rules for enumeration constants and similar. Therefore, we exclude them as usable types at this time.
We do not exclude
from the possible set of types. It is allowed in C++ and other C extensions, and it allows for an API to provide mnemonic or otherwise fitting names for binary choices without needing to resort to a bit-field of a particular type. This provides a tangible benefit to code. Values outside of true
or false
can be errored/warned on when creating a
enumeration, but that is a quality of implementation decision.
4.3. Variables, Declarations, and Parsing (Oh my!)
Currently, parsers for C may not properly handle the following code:
int main () { enum e : long long value = 0 ; return 0 ; }
A sufficiently weak parser implementation can determine that this is an enumeration of underlying type
, and leave the declaration name to be the second
. This is a constraint violation, thanks to declaring a variable of
, and there is no workaround for it. There are several options to help accomodate for this problem:
-
for enumerations declaring variables, putting an underlying type is not allowed unless the enumeration is also being defined or is used purely as a forward declaration (no identifier);
-
for enumerations declaring type definitions, putting an underlying type is not allowed unless the enumeration is also being defined (as you cannot forward-declare a type definition, this does not have the same exemption as #1 on this list); and,
-
as a fallout from #1, because this can never be used to declare an object, any use of an equals sign or similar to provide an initializer to initialize the value is also illegal if there is a specifier for the underlying type.
This forms a comprehensive set of fixes for the given issues. Finally, if an identifier is present, the implementation is required to consume the longest token sequence that would compose of a single type name (named according to the C grammar as: specifier-qualifier-list), before the opening brace
is provided.
4.4. Type of Enumeration Constants
Given this code sample:
enum e : unsigned short { x }; int main () { return _Generic ( x , enum e : 0 , default : 1 ); }
The program returns
.
is considered a type
, and is compatible with
. Therefore, the following program would be a constraint violation regarding
:
enum e : unsigned short { x }; int main () { return _Generic ( x , enum e : 0 , unsigned short : 2 , default : 1 ); }
Furthermore, this program would return
:
enum e : unsigned short { x }; int main () { return _Generic ( x , unsigned short : 0 , default : 1 ); }
since the enumerated type is compatible with the underlying type (but not the other way around).
4.5. Incomplete Types?
Previous revisions of this paper attempted to say that enumerations declared without underlying types could be considered incomplete types, similar to structures and unions. This may not always work because compatibility rules (and the ability to pun between pointers of said types) may not work because a forward-declared enumeration without an underlying type may be compatible with any integer type, and it is not guaranteed that all pointers to integer types have the same storage and alignment requirements. Does there exist an implementation where
,
,
, and similar do not exhibit the same storage and alignment requirements (not of what they point to, but of the literal pointer value itself)? It is dubious to answer "yes". But, the rule in §6.2.5¶25 that makes structures and unions have the same alignment requirements but not the integer types:
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.53) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other. All pointers to union types shall have the same representation and alignment requirements as each other. Pointers to other types need not have the same representation or alignment requirements.
— §6.2.5¶31, ISO/IEC 9899:202x, C Standard Working Draft, April 12th, 2022
So we cannot guarantee that the requirements for compatibility (pointer values to any two types have the same storage and alignment) are met. That rule has been there for a long time, so they must have a good reason for not allowing it for the integer types. (… Right?)
Nothing needs to be said for enumerations with fixed underlying types because enumerations with fixed underlying types are always complete, and therefore need no special rules for handling their existence as an "incomplete" pointer.
5. Proposed Wording
The following wording is relative to N2731.
5.1. Intent
The intent of the wording is to provide the ability to express enumerations with the underlying type present. In particular:
-
enumerations can optionally have a type declared as the underlying type or otherwise defaults to the previous behavior (opt-in);
-
enumerations with an underlying type must use a signed or unsigned (standard or extended) integer type that is not a bit-precise integer type, or another enumeration type directly;
-
enumerations with underlying types ignore const, volatile,
, and all other qualifiers on a given type;_Atomic -
enumerations with underlying types can be forward-declared alongside enumerations without underlying types;
-
enumerations with underlying types cannot be forward-declared with different underlying types than the first forward declaration;
-
enumerations that are foward declared only compatible with themselves, and not any potential underlying type;
-
enumerations with an underlying type can be redeclared without an underlying type (e.g.,
matchesenum a : int ;
);enum a ; -
enumerations with an underlying type can have enumerators initialized with constant expressions whose type is not strictly
orint
used to specify their values;unsigned int -
enumerations of an underlying type used directly in a generic expression are treated as an integer of that underlying type; and,
-
operations performed on an enumeration with an underlying type treat the type of the enumeration as an integer of that specified underlying type.
5.2. Proposed Specification
5.2.1. Modify Section §6.2.7 Compatible type and composite type, paragraph 1
… Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: if one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: … For two enumerations, corresponding members shall have the same values ; if one has a fixed underlying type, then the other must have a fixed underlying type and the fixed underlying types must be compatible.
5.2.2. Modify Section §6.4.4.3 Enumeration constants
6.4.4.3 Enumeration constantsSyntaxenumeration-constant:
identifier
SemanticsAn identifier declared as an enumeration constant for an enumeration without fixed underlying type has type
. An identifier declared as an enumeration constant for an enumeration with fixed underlying type has that underlying type during the specification of the enumeration type (i.e., from the start of the opening brace { in the enum-specifier to the closing brace }).
int An enumeration constant may be used in an expression (or constant expression) wherever a value of standard or extended integer type may be used. It has the underlying type of the enumeration.
Forward references: enumeration specifiers (6.7.2.2).
5.2.3. Modify Section §6.7.2.2 Enumeration constants
6.7.2.2 Enumeration specifiersSyntaxenum-specifier:
enum attribute-specifier-sequenceopt identifieropt enum-type-specifieropt { enumerator-list }
enum attribute-specifier-sequenceopt identifieropt enum-type-specifieropt { enumerator-list , }
enum identifier enum-type-specifieropt
enumerator-list:
enumerator
enumerator-list , enumerator
enumerator:
enumeration-constant attribute-specifier-sequenceopt
enumeration-constant attribute-specifier-sequenceopt = constant-expression
enum-type-specifier:
: specifier-qualifier-list
All enumerations have an underlying type. The underlying type can be explicitly specified using an enum-type-specifier and such an underlying type is its fixed underlying type. If it is not explicitly specified, the underlying type is the enumeration’s compatible signed or unsigned integer type.
ConstraintsFor an enumeration with a fixed underlying type, an enumeration constant with a constant expression that defines its value shall:
— have that value be representable as that fixed underlying type without conversion, if the fixed underlying type is not
; or
bool — be implicitly converted to 1 or 0 following the usual conversion rules for
(6.3.1.2), if the underlying type is
bool .
bool The definition of an enumeration constant without a defining constant expression shall not overflow the maximum of the fixed underlying type by adding 1 to the previous enumeration constant.
The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an
(for an enumeration without a fixed underlying type) or the fixed underlying type, respectively.
int If an enum type specifier is present, then the longest possible sequence of tokens that can be interpreted as a specifier qualifier list is as interpreted part of the enum type specifier. It shall name an integer type that is not an enumeration or bit-precise integer type.For any declarator (6.7.6) that declares an enumeration with a fixed underlying type (the enum type specifier is present), but does not provide the opening brace {, enumerator list, and closing brace }, it shall:
— not be used to also declare any objects or function parameters (6.7.6.3);
— not be used to also declare any type definitions (6.7.8); and,
— not be followed by an equal = with an initiaizer (6.7).
If two enum specifiers that include an enum type specifier declare the same type, the underlying types shall be compatible.SemanticsThe optional attribute specifier sequence in the enum specifier appertains to the enumeration; the attributes in that attribute specifier sequence are thereafter considered attributes of the enumeration whenever it is named. The optional attribute specifier sequence in the enumerator appertains to that enumerator.The identifiers in an enumerator list of an enumeration without fixed underlying type are declared as constants that have type
int and they. The identifiers in an enumerator list of an enumeration with fixed underlying type are declared as constants whose types are the same as the enumerated type. They may appear may appear wherever such are permitted.133) An enumerator with = defines its enumeration constant as the value of the constant expression. If the first enumerator has no =, the value of its enumeration constant is 0. Each subsequent enumerator with no = defines its enumeration constant as the value of the constant expression obtained by adding 1 to the value of the previous enumeration constant. (The use of enumerators with = may produce enumeration constants with values that duplicate other values in the same enumeration.) The enumerators of an enumeration are also known as its members.EachFor all enumerations without fixed underlying type, each enumerated type shall be compatible with, a signed integer type, or an unsigned integer type (excluding the bit-precise integer types) . The choice of type is implementation-defined139), but shall be capable of representing the values of all the members of the enumeration.
char [📝 NOTE TO EDITOR: The wording in the above paragraph for "excluding the bit-precise…" is identical from the "Improved Normal Enumerations" Proposal, and should be appropriately merged if both paper are added to the standard.]For all enumerations with a fixed underlying type, the enumerated type is compatible with the underlying type of the enumeration. After possible lvalue conversion a value of the enumerated type behaves the same as the same value with the underlying type, in particular with all aspects of promotion, conversion and arithmetic.FN0✨).
FN0✨) This means in particular that if the compatible type is, values of the enumerated type behave in all aspects the same as
bool and the members will only have values
bool and
0 . If it is a signed integer type and the constant expression of an enumeration constant overflows, a constraint for constant expressions (6.6) is violated.
1 TheAn enumerated type declaration without a fixed underlying type is an incomplete type until immediately after the } that terminates the list of enumerator declarations, and complete thereafter. An enumerated type declaration of an enumeration with fixed underlying type declares a complete type immediately after its enum type specifier (i.e. after the opening { of its enumerator list).EXAMPLE The following fragment: …
…
EXAMPLE Even if the value of an enumeration constant is generated by the implicit addition of 1, an enumeration with fixed underlying type does not exhibit typical overflow behavior:
#include <limits.h>enum us : unsigned short { us_max = USHRT_MAX , us_violation , /* Constraint violation: USHRT_MAX + 1 would overflow. */ us_violation_2 = us_max + 1 , /* Maybe constraint violation: USHRT_MAX + 1 may be promoted to "int", and result is too wide for the underlying type. */ us_wrap_around_to_zero = ( unsigned short )( USHRT_MAX + 1 ) /* Okay: conversion done in constant expression before conversion to underlying type: unsigned smenatics okay. */ }; enum ui : unsigned int { ui_max = UINT_MAX , ui_violation , /* Constraint violation: UINT_MAX + 1 would overflow. */ ui_no_violation = ui_max + 1 , /* Okay: Arithmetic performed as typical unsigned integer arithmetic: conversion from a value that is already 0 to 0. */ ui_wrap_around_to_zero = ( unsigned int )( UINT_MAX + 1 ) /* Okay: conversion done in constant expression before conversion to underlying type: unsigned smenatics okay. */ }; int main () { // Same as return 0; return ui_wrap_around_to_zero + us_wrap_around_to_zero ; } EXAMPLE The following fragment:
#include <limits.h>enum E1 : short ; enum E2 : short ; enum E3 ; enum E4 : unsigned long long ; enum E1 : short { m11 , m12 }; enum E1 x = m11 ; enum E2 : long { m21 , m22 }; /* Constraint violation: different underlying types */ enum E3 { m31 , m32 , m33 = sizeof ( enum E3 ) /* Constraint violation: E3 is incomplete */ }; enum E3 : int ; /* Constraint violation: E3 previously had no underlying type */ enum E4 : unsigned long long { m40 = sizeof ( enum E4 ), m41 = ULLONG_MAX , m42 /* Constraint violation: unrepresentable value (overflow) */ }; enum E5 y ; /* Constraint violation: incomplete type */ enum E6 : long int z ; /* Constraint violation: enum-type-specifier with identifier in declarator */ enum E7 : long int = 0 ; /* Constraint violation: enum-type-specifier with initializer */ demonstrates many of the properties of multiple declarations of enumerations with underlying types. Particularly,
is declared without an underlying type first, therefore a redeclaration with an underlying type second is a violation. Because it not complete at that time within its enumerator list,
enum E3 is a constraint violation within the
sizeof ( enum E3 ) definition.
enum E3 is complete as it is being defined, therefore
enum E4 is not a constraint violation.
sizeof ( enum E4 ) EXAMPLE The following fragment:enum no_underlying { a0 }; int main () { int a = _Generic ( a0 , int : 2 , unsigned char : 1 , default : 0 ); int b = _Generic (( enum no_underlying ) a0 , int : 2 , unsigned char : 1 , default : 0 ); return 0 ; } demonstrates the implementation-defined nature of the underlying type of enumerations using generic selection (6.5.1.1). The value of
after its initialization is
a . The value of
2 after its initialization is implementation-defined: the enumeration must be compatible with a type large enough to fit the values of its enumeration constants. Since the only value is
b for
0 ,
a0 may hold any of
b ,
2 , or
1 .
0 Now, consider a similar fragment, but using a fixed underlying type:
enum underlying : unsigned char { b0 }; int main () { int a = _Generic ( b0 , int : 2 , unsigned char : 1 , default : 0 ); int b = _Generic (( enum underlying ) b0 , int : 2 , unsigned char : 1 , default : 0 ); return 0 ; } Here, we are guaranteed that
and
a are both initialized to
b . This makes enumerations with a fixed underlying type more portable.
1 EXAMPLE Enumerations with a fixed underlying type must have their braces and the enumerator list specified as part of their declaration if they are not a standalone declaration:void f1 ( enum a : long b ); /* Constraint violation */ void f2 ( enum c : long { x } d ); typedef enum t u ; typedef enum v : short W ; /* Constraint violation */ typedef enum q : short { s } R ; enum forward ; extern enum forward fwd_val0 ; /* Constraint violation: incomplete type */ extern enum forward * fwd_ptr0 ; /* Constraint violation: enums cannot be used like other incomplete types */ extern int * fwd_ptr0 ; /* Constraint violation: incompatible with incomplete type */ enum forward1 : int ; extern enum forward1 fwd_val1 ; extern int fwd_val1 ; extern enum forward1 * fwd_ptr1 ; extern int * fwd_ptr1 ; int main () { enum e : short ; enum e : short f = 0 ; /* Constraint violation */ enum g : short { y } h = y ; return 0 ; } Forward references: generic selection (6.5.1.1), tags (6.7.2.3), declarations (6.7), declarators (6.7.6), function declarations (6.7.6.3), type names (6.7.7) .
5.2.4. Modify Section §6.7.2.3 Tags
6.7.2.3 TagsConstraints…A type specifier of the form
enum attribute-specifier-sequenceopt identifier
without an enumerator list shall only appear after the type it specifies is complete.
…A type specifier of the form
struct-or-union attribute-specifier-sequenceopt identifieropt { member-declaration-list }
or
enum attribute-specifier-sequenceopt identifieropt enum-type-specifieropt { enumerator-list }
or
enum attribute-specifier-sequenceopt identifieropt enum-type-specifieropt { enumerator-list , }
declares a structure, union, or enumerated type. …
…A declaration of the formor
struct-or-union attribute-specifier-sequenceopt identifier ;
- enum attribute-specifier-sequenceopt identifier enum-type-specifieropt ;
specifies a
structure or union typestructure, union, or enumerated type and declares the identifier as a tag of that type.142) If the enumerated type contains the enum type specifier, it is complete. The optional attribute specifier sequence appertains to the structure or union type being declared; the attributes in that attribute specifier sequence are thereafter considered attributes of the structure or union type whenever it is named.If a type specifier of the form
struct-or-union attribute-specifier-sequenceopt identifier
occurs other than as part of one of the above forms, and no other declaration of the identifier as a tag is visible, then it declares an incomplete structure or union type, and declares the identifier as the tag of that type.143)
143)A similar construction withthat does not contain a fixed underlying type does not exist. Enumerations with a fixed underlying type are always complete after the enum type specifier.
enum If a type specifier of the form
struct-or-union attribute-specifier-sequenceopt identifier
or
enum attribute-specifier-sequenceopt identifier enum-type-specifieropt
occurs other than as part of one of the above forms, and a declaration of the identifier as a tag is visible, then it specifies the same type as that other declaration, and does not redeclare the tag.
5.2.5. Add implementation-defined enumeration behavior to Annex J
6. Acknowledgements
Thanks to:
-
Aaron Ballman for help with the initial drafting;
-
Aaron Ballman, Aaron Bachmann, Jens Gustedt & Joseph Myers for questions, suggestions and offline discussion;
-
Robert Seacord for editing suggestions; and,
-
Joseph Myers for detailed discussion on the issues with enumerated types, completeness, and more.
-
Clive Pygott for the initial revisions of this paper before the next author was added in to help.
We hope this paper serves you all well.