2022-04-23
org: | ISO/IEC JCT1/SC22/WG14 | document: | N2978 | |
… WG21 C and C++ liaison | P2312 | |||
target: | IS 9899:2023 | version: | 4 | |
date: | 2022-04-23 | license: | CC BY |
Since more than a decade C++ has already replaced the problematic definition of NULL
which might be either of integer type or void*
. By using a new constant nullptr
, they achieve a more constrained specification, that allows much better diagnosis of user code. We propose to integrate this concept into C as far as possible by imposing only minimal ABI additions.
nullptr_t
as a complete object type that has the same representation as void*
and char*
but only one value, nullptr
nullptr_t
only explicitly formulate conversion to bool
and test for equalitynullptr_t
to all places that do “Boolean” evaluation, they are currently formulated as comparison to 0
nullptr_t
to the set of possible argument types of ...
lists, and make them compatible with void*
and char*
interpretation by va_arg
.v3/R1: integrating feedback from different sources
nullptr
incomplete and incompletablenullptr
itself and insist that it has as type that is different from any other standard type or type that could be defined by users codenullptr
does not have a scalar type, add it explicitly to contexts such as or similar that so far only had scalarsint
of value 0
and 1
for contexts where logical evaluation still has that type_Generic
also are constant expressions or null pointer constants if the respective operands arenullptr
to generic selectionnullptr
as the last parameter before a ...
nullptr
parameters without namesv2/R0: a complete rewrite as a proper language feature instead of a shallow macro solution
The macro NULL
, that goes back quite early, was meant to provide a tool to specify a null pointer constant such that it is easily visible and such that it makes the intention of the programmer to specifier a pointer value clear. Unfortunately, the definition as it is given in the standard misses that goal, because the constant that is hidden behind the macro can be of very different nature.
A null pointer constant can be any integer constant of value 0 or such a constant converted to void*
. Thereby several types are possible for NULL
. Commonly used are 0
with int
, 0L
with long
and (void*)0
with void*
.
This may lead to surprises when invoking a type-generic macro with an NULL
argument.
Conditional expressions such as (true ? 0 : NULL)
and (true ? 1 : NULL)
have different status depending how NULL
is defined. Whereas the first is always defined, the second is a constraint violation if NULL
has type void*
, and defined otherwise. In particular, the second happens to work in C++ but most of the times not in C.
A NULL
argument that is passed as a sentinel to a ...
function that expects a pointer can have severe consequences. On many architectures nowadays int
and void*
have different size, and so if NULL
is just 0
, a wrongly sized arguments is passed to the function.
In particular, C++ can’t have NULL
as (void*)0
because void*
does not implicitly convert to other pointer types. Thus it is usually an integer constant of value zero. On the C side (e.g by printf
) such a passed integer constant is then interpreted as void*
or char*
; such a re-interpretation has undefined behavior.
nullptr
constant?Null pointer constants in C are a feature that is somewhat defined orthogonal to the type system. They are based on the concept of “integer constant expressions” and may in fact have any integer type (even bool
, enumerations, character constants or expressions such as x-x
are possible) as long as the value can be determined at translation time and happens to be zero. On top of that ambiguity concerning integer types, it is even permitted to use an explicit cast to void*
and to still obtain an integer constant expression.
The standard macro NULL
inherits from these confusing definitions and has no standardized type and no standardized behavior in contexts that are different from simple conversion to a pointer type. For example a use of NULL
as an argument to a ...
function is not guaranteed to work.
If NULL
has integer type but different alignment or size than void*
any access with va_arg
that interprets such an argument could crash the program.
If NULL
has integer type and null pointers are not represented as all-bit zero, such a transfered integer cannot be reinterpreated as a pointer value that would be a null pointer.
If NULL
has integer type (and not void*
) and if even the integer type, say long
, has the correct size and alignment, an interpretation of that past-in integer in the form
has undefined behavior. As an exception va_arg
allows the reinterpreation between void*
and char*
, for example, but not from integer type to pointer type.
Also, it is not easy to detect if an argument to a function or even macro is a null pointer constant or only an arbitrary null pointer value. In C, compile time code distinction is usually done in the preprocessor or by _Generic
. The preprocessor doesn’t work with NULL
because it might not even be a preprocessor constant. _Generic
is difficult to use because it is based on types and not values, although there are ways to abuse properties of conditional expressions, integer constant expressions, null pointer constants and _Generic
to do so.
Another reason to strengthen the definition of null pointer constants in C is the common confusion between a null pointer and a pointer that points to the zero address in the OS, as is suggested by using integer literals such as 0
to express null pointer constants. Also, the fact that on some architectures a null pointer is not necessarily represented with a all-zero bit-pattern always needs special attention when teaching C and is quite surprising for beginners. If it were that these sophistic distinctions would be necessary for the expressivity of the language, that could perhaps be acceptable, but here it clearly is a random burden that is imposed on generations of teachers and students that is only rooted in history and has no raison d’être as of today; all other programming languages that have concepts similar to pointers in C do quite well without this ambiguity between numbers and pointers.
The idea of nullptr
is to end this ambiguity and to provide a keyword with a value and a portable type that can be used anywhere where a null pointer constant is needed.
The nullptr
feature presented in this paper has the following properties.
bool
by always evaluating to false
.nullptr
is represented with the same bit-pattern as a null pointer constant of type void*
.nullptr
is permitted in all “Boolean” contexts such as &&
operators or if
statements.nullptr
is permitted as argument to ...
, as long as the function interprets it as pointer to void
or character type.The aim is that this feature has exactly the same behavior as the corresponding feature in C++.
nullptr_t
type different from void*
?The secondary feature proposed in this paper is the the type nullptr_t
with the intent to allow better diagnostics for functions that possibly receive a null pointer argument and to potentially optimize the case where a null pointer constant is received.
Consider a function func
that receives a pointer parameter that can either be valid or a null pointer to indicate a default choice.
// header "func.h"
void func_general(toto*);
// define a default action
// no parameter name, parameter is never read
inline void func_default(nullptr_t) {
...
}
#define func(P) \
_Generic((P), \
nullptr_t: func_default, \
default: func_general)(P)
// one translation unit
#include "func.h"
// emit an external definition
extern void func_default(nullptr_t);
// define the general action
void func_general(toto* p) {
// p may still have value null
if (!p) func_default(nullptr); // may only be called with nullptr
else {
...
}
}
Here, a function func_default
is defined that receives a nullptr
. The function needs no access to the parameter, since that parameter can only hold one specific value. A type-generic macro func
then chooses this function or the general function func_general
. The translation unit that defines func_general
may then emit an external definition of func_default
and also use it within the definition for the case that func_general
receives a parameter value that is null without being recognized as such at translation time of the call.
#include "func.h"
...
func(0); // ok, but uses the general function and may issue a diagnostic
func((void*)0); // ok, but uses the general function, no diagnostic
func(NULL); // ok, but uses the general function, diagnostic or not
func((toto*)0); // ok, but uses the general function, no diagnostic
func(nullptr); // uses default action directly
The use of the macro with a null pointer constant of integer type then uses the general function and sets the parameter to null; implementations that chose to diagnose the use of null pointer constants of integer type may do so for this call.
In contrast to that, a call that uses nullptr
as an argument directly resolves to func_default
, may or may not inline the corresponding action, and will not trigger such a diagnosis.
The emission of a diagnosis can be forced by restricting the admissible type as shown in the definition of func_strict
.
#define func_strict(P) \
_Generic((P), \
nullptr_t: func_default, \
toto*: func_general)(P)
...
func_strict(0); // invalid, int argument is not a valid choice, constraint violation
func_strict((void*)0); // invalid, void* argument is not a valid choice, constraint violation
func_strict(NULL); // invalid, void* or integer argument is not a valid choice, constraint violation
func_strict((toto*)0); // ok, but uses the general function, no diagnostic
func_strict(nullptr); // uses default action directly
After WG14 refused a specification for a simple macro with value (void*)0
, as well as a sophisticated version with an incomplete type and with a rewriting approach for many contexts, this new version tries a middle ground.
The principal property of nullptr
is that it is a null pointer constant. But it is one of its own right, not deduced from a property of any other feature. From the existing text it then basically follows that it can be used everywhere where a pointer is to be initialized or assigned to a null pointer value.
It has a type that is different from all other null pointer constants, in particular the type is neither an integer nor a pointer type. So in any context where type plays a role, it cannot be confused with an expression with a type of any of these.
The type of nullptr
is a complete object type that is neither an array nor a scalar type and has exactly one value, namely nullptr
. For C, this directly disallows the use of the type (and thus nullptr
) in most other expressions, in particular in arithmetic.
Because we want to be able to use this type also for parameters, as members of unions (for type punning), and as argument to ...
functions, we have to prescribe a representation that makes it admissible to these sorts of contexts. The only possible choice for this is to have the same alignment, size and representation as void*
and to force the representation of nullptr
to the same bit-pattern as null pointers of type void*
.
To enable the use for ...
functions, we then just add another exception for va_arg
, namely that the behavior is well-defined if an object of nullptr_t
is re-interpreted as void*
or char*
, for example. Because of our choice for the representation, this is easily possible.
nullptr
in “Boolean” contextsPointers are often used in contexts that have a “Boolean” interpretation, such as if
statements, ternary expressions or conversions to bool
. In C++ this is also possible for nullptr
so we enable this feature explicitly for all contexts that have such a “Boolean” interpretation. Note that for C++ this is much easier, because there these context are all handled by implicit conversion to bool
.
Here, for C, we have to do a little bit more work and have to define conversion to bool
and equality comparison separately. For other contexts, the integration is then simply done by adding nullptr_t
to the types that are permitted in addition to scalar types.
Since nullptr
and nullptr_t
are new features, there is no impact for existing code that does not use them.
Code that starts using nullptr_t
for interfaces (either as function parameters or via _Generic
) will not encounter direct incompatibilities with existing code, because the type didn’t exist before.
Using nullptr
itself for the assignment or initialization of variables or as arguments to pointer parameters will work seamlessly; nullptr
converts implicitly to any pointer type, much as NULL
or any of the current null pointer constants. Eventually, changing the use of NULL
for nullptr
might detect the misuse of that feature in a context where an integer is expected. This is intended and considered to be an improvement.
Using nullptr
for calls into macros that implement type-generic interfaces may encounter incompatibilities. In particular, for interfaces that perform type inspection by means of _Generic
the new type nullptr_t
of the constant may not fit any of the choices. But, in general this means that the code was not robust when presented with null pointer constants of varying type (integer type, void*
) before. In general these problems will result in constraint violations, and thereby give the opportunity to improve the code receiving the nullptr
argument with respect to these aspects. This is consistent with the Charter, which states that if there are to be changes, they should strive to be diagnosed rather than perform silent changes in behavior.
Using nullptr
for calls into functions with ...
will improve situations that had been undefined before. In particular, nullptr
can be used as a sentinel for a list of pointers to void
or character type, which is not portable when using NULL
.
C library implementors would have to add the type nullptr_t
to their <stddef.h> header. This can be achieved similar to the following, where 2023MML
is the __STDC_VERSION__
number chosen for C23.
#if __STDC_VERSION__ >= 2023MML
typedef typeof(nullptr) nullptr_t; // C23 supports typeof and nullptr
# define __STDC_VERSION_STDDEF_H__ 2023MML
#endif
The concept to present a null pointer constant as a keyword that is tightly integrated into the language as is proposed here is present in most other programming languages that have the concept of pointers, for example Pascal, Lisp, Smalltalk, Ruby, Objective-C, Lua, Scala, or Go, often with other spellings such as nil
, NIL
, None
, null
or Null
. The fact that C still does express this concept with other language features is a rare exception in this picture and only a historic artefact and not a necessity.
The nullptr
feature together with nullptr_t
is present in C++ since C++11 and has extensive implementation and application experience in that framework. This feature is also given under a different name in the Plan 9 C compiler, named nil
. It approximates some of the features provided below, but not all of them.
C users often shift between using literal 0
versus (void*)0
for a library-deployed, macro-based definition. There are various tradeoffs for doing this (discussed as part of the design decisions above) that can make this have undesirable behaviors and qualities. Recently, users have tried to move away from their own personal definitions for portability and correctness reasons.
Changes are proposed against the wording in C23 draft n2731 to which the accepted changes concerning keywords have been added. Green and underlined text is new text.
Add to the end of p1
When a
nullptr_t
value is converted tobool
, the result isfalse
.
Change p3
3 An integer constant expression with the value
0
,orsuch an expression cast to typevoid*
, or the predefined constantnullptr
,is calledare a null pointer constant.68) If a null pointer constant or a value of typenullptr_t
(which is necessarily the valuenullptr
) is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
Add nullptr
to the lists in p1.
Add nullptr
to the list of predefined constants and a new paragraph to the description
The keyword
nullptr
represents a null pointer constant. Details of its type are described in 7.19.x.
Add two items to the list of constraints in p2
– both operands have type
nullptr_t
;
– one operand has type
nullptr_t
and the other is a null pointer constant;
Add to the end of p5
If both operands have type
nullptr_t
or one operand has typenullptr_t
and the other is a null pointer constant, they compare equal.
By that a comparison of values of type nullptr_t
to 0
(similar as for pointers seen as null pointer constant) is always well defined.
To be easily compatible to current uses of NULL
we add the nullptr_t
to all contexts that traditionally allow to interpret a pointer as a Boolean value. The particular result for using nullptr
or an lvalue of type nullptr_t
(that might not be a null pointer constant but just a null pointer) can then be deduced from the equality operators much as this is done for pointer types.
1 The operand of the unary
+
or-
operator shall have arithmetic type; of the~
operator, integer type; of the!
operator, scalar ornullptr_t
type.
2 Each of the operands shall have scalar or nullptr_t type.
2 Each of the operands shall have scalar or nullptr_t type.
2 The first operand shall have scalar or nullptr_t type.
3 One of the following shall hold for the second and third operands:FNT1)
FNT1) If a second or third operand of type nullptr_t
is used that is not a null pointer constant, a constraint is violated.
if
statement (6.8.4.2)1 The controlling expression of an
if
statement shall have scalar or nullptr_t type.
2 The controlling expression of an iteration statement shall have scalar or nullptr_t type.
nullptr_t
type (7.19.x)Add to 7.19 p2
which is the type of the
nullptr
predefined constant, see below;
And add a new clause 7.19.x to the <stddef.h> header
7.19.x The
nullptr_t
type
Description
1 The
nullptr_t
type is the type of thenullptr
predefined constant. It has only a very limited use in contexts where this type is needed to distinguishnullptr
from other expression types. It is an unqualified complete object type that is neither an atomic, scalar or array type and that has one value,nullptr
. Default initialization of an object of this type is equivalent to an initialization bynullptr
.
2 The size and alignment of
nullptr_t
is the same as for a pointer to character type. An object representation of the valuenullptr
is the same as the object representation of a null pointer value of typevoid*
. An lvalue conversion of an object of typenullptr_t
with such an object representation has the valuenullptr
; if the object representation is different, the behavior is undefined.FNT0)
FNT0) Thus, during the whole program execution an object of type
nullptr_t
evaluates to the assumed valuenullptr
.
3 NOTE Because of the restrictions on the type category, the use of values of this type in expressions is implicitly constrained in many ways throughout clause 6, in particular for arithmetic. Exempted from such constraints are uses, for example,
- as the operand of an
alignas
,sizeof
or typeof operators,- as the operand of an implicit or explicit conversion to a pointer type,
- as the assignment expression in an assignment or initialization of an object of type
nullptr_t
,- as an argument to a parameter of type
nullptr_t
or in a variable argument list,- as a
void
expression,- as the operand of an implicit or explicit conversion to
bool
,- as an operand of a
_Generic
primary expression,- as an operand of the
!
,&&
,||
or conditional operators, or- as the controlling expression of an
if
or iteration statement.
va_arg
macro (7.16.1.1)Modify the end of p2
If type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
– both types are pointers to qualified or unqualified versions of compatible types;
– one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;
– one type is pointer to qualified or unqualified
void
and the other is a pointer to a qualified or unuqualified character type.;
– the type of the next argument is
nullptr_t
and type is a pointer type that has the same representation and alignment requirements as a pointer to a character type.FNT1)
FNT1) Such types are in particular pointers to qualified or unqualified versions of
void
.
Note to the editors: Please, observe the typo corrected above. The readability of 7.16.1.1 could gain by renaming the macro parameter type to something like T.
There are several other editorial changes that can be done in that context. We leave them discretion of the editors.
NULL
There are several usages of the macro NULL
throughout the library clause of the form (char**)NULL
which would probably better be replaced by nullptr
without cast.
For several of the places where this document proposes changes, forward references to 7.19.x either directly in the text or as separate paragraph at the end of the respective clause could be needed.
Using a null pointer constant in form of an integer expression as argument to a ...
function and then interpret it as void*
or char*
is undefined behavior. This could be added to Annex J as entry for va_arg
(7.16.1.1)
A specific entry for nullptr_t
(7.19.x) could be made that stipulates that arbitrarily changing or copying from a non-null pointer value into a nullptr_t
object and then reading that object has UB.
This paper proposes a change to the <stddef.h> header, so this header now needs a test macro __STDC_VERSION_STDDEF_H__
. During the transition to C23, this would help users to determine if the nullptr_t
type is available for their current version of the C library:
#include <stddef.h>
#if __STDC_VERSION_STDDEF_H__ > 0
/* all is fine, we should also have have nullptr */
#elif __STDC_VERSION__ > 202300L
typedef typeof(nullptr) nullptr_t; // C23 supports typeof and nullptr
#else
# error "nullptr_t is missing"
#endif
Does WG14 want to integrate the changes of N2978 into C23?
Many thanks to Joseph Myers for the very detailed review and feedback for earlier versions of this paper.