C Document number: N2659
Date: 2021-01-05
Author: Miguel Ojeda <ojeda@ojeda.dev>
Project: ISO/IEC JTC1/SC22/WG14: Programming Language C
Potential undefined behavior along some paths of execution is common in C functions. However, some functions can be designed to be safe. That is, to avoid undefined behavior in all cases. Giving designers of APIs a way to mark functions as safe ones serves a documentation purpose, allows tools to provide diagnostics on potential undefined behavior and is useful for cross-language binding generation purposes. This proposal adds the concept of a “safe function” as well as a safe
attribute to C.
Since the Internet became a common term in our lives, and in particular during the last decade, we have seen a sharp increase in the overall awareness of the importance of building secure, robust and reliable software, due to our increasingly networked computing world and its associated risks. Several major companies and projects have reported that about 70% of their serious vulnerabilities in general-purpose software are due to memory safety problems [1][2].
To tackle this, some institutions have started applying stricter policies, rules and guidelines for their software, like the SEI CERT C Coding Standard [3]. Others use extensions or subsets of C that can guarantee certain properties, such as Checked C [4] and MISRA C [5], commonly used in safety-critical industries. There are also languages with a safety and security focus, like Ada [6], SPARK [7] and Rust [8]. Within the C committee, there have been many proposals along the years with the goal of improving the safety and security of the language and its library [9][10][11][12][13][14][15][16][17][18][19].
This paper introduces a concept of safety similar and/or related to the ones found in other languages such as C# [20], D [21], Haskell [22], Java [23], Rust [24], Swift [25], etc. It is not the same as the ones introduced by standards related to safety-critical systems such as DO-178 [26] or ISO 26262 [27].
The motivation behind introducing this notion into C is to encourage developers to design safe APIs, to allow them to mark and document their APIs as such, to allow easier consumption of C libraries from other languages, to improve C’s ability to perform as the lingua franca between languages, to improve diagnostics and developer understanding around undefined behavior, to provide toolchains with the ability to recognize intended safety properties of an API and, ultimately, to increase the safety, security, reliability and robustness of software components written in C.
Consider the following function:
int f1(int a, int b) {
return a / b;
}
This function leads to undefined behavior when called with the following set of inputs:
f1(x, 0)
for any x
.f1(INT_MIN, -1)
.We say that this function is unsafe. The majority of C functions in common, general-purpose software are unsafe.
There are also safe functions. For instance, we could make f1()
safe as follows:
int f2(int a, int b) {
if (b == 0)
abort();
if (a == INT_MIN && b == -1)
abort();
return a / b;
}
The safe
attribute is a way to mark a function as designed to be safe:
[[safe]] int f2(int a, int b) {
// as above
}
The attribute does not change the behavior of the function in any way. The purpose of the safe
attribute is:
To document that a function is designed to be safe. That is, it is not a guarantee enforced by the implementation, but a promise by the developer. The annotation can be used not only for carefully written C functions, but also for external functions written in languages that guarantee no undefined behavior, for instance.
To let compilers, static analyzers, runtime sanitizers, etc. provide diagnostics if they detect paths for which there is potential undefined behavior in such a function at either compile-time or runtime.
To provide parsers, code generation tools, binding generators, etc. with information about such a promise.
For instance, a bindings generator for calling C functions from a language that supports safe functions (such as bindgen
[28] for Rust) could decide to automatically mark C [[safe]]
functions as safe ones in the target language. This is useful not just for cross-language documentation purposes, but also for ergonomics: some languages, like Rust, require extra syntax for calls to unsafe functions.
Similarly, a bindings generator for calling a language that supports safe functions from C (such as cbindgen
[29]) could mark them automatically as [[safe]]
. This is useful not just to carry over cross-language information, but also enhances C’s ability as lingua franca for defining interfaces (i.e., languages with safe functions may export their interfaces through C to other languages which support the concept).
Consider now the following function:
[[safe]] int third(const int * vector, size_t n_elements) {
if (vector == NULL)
abort();
if (n_elements < 3)
abort();
// unsound
return v[2];
}
This function is marked as [[safe]]
, yet it is not actually safe: the vector
parameter could point anywhere and the n_elements
parameter may not correspond to the size of the vector
. In fact, this function can result in critical undefined behavior, even though it is clear the writer made an attempt to get rid of as much undefined behavior as possible.
Functions that are marked as [[safe]]
yet are not safe are said to be unsound and are understood to be erroneous/undesirable. That is, the safe
attribute should be avoided if a function is known to be unsound, and soundness issues should be fixed when discovered.
In this case, there is no way to make a function like third()
sound. However, if the designer of the API is able to change the interface, it is possible. For instance, consider a library that gives out handles to instances of this vector, taking care of handling everything internally:
typedef size_t IntVector;
[[safe]] void iv_init(void);
[[safe]] IntVector iv_create(size_t size);
[[safe]] void iv_destroy(IntVector iv);
[[safe]] bool iv_is_valid(IntVector iv);
[[safe]] void iv_push(IntVector iv, int value);
[[safe]] int iv_pop(IntVector iv);
[[safe]] int iv_get(IntVector iv, size_t index);
[[safe]] void iv_set(IntVector iv, size_t index, int value);
These functions can be implemented soundly if the implementation keeps track of the handles it has given out plus the related information needed to verify later calls on them (e.g., the current size for each vector). That is, even if the caller creates or manipulates IntVector
objects, the functions can verify they are dealing with a valid handle.
Note that APIs that employ similar approaches such as an opaque type behind a pointer might not be possible to implement soundly for that reason. For instance, consider:
struct Opaque;
typedef struct Opaque * IntVector;
[[safe]] IntVector iv_create(size_t size);
[[safe]] void iv_destroy(IntVector iv); // sound?
Assuming iv_create()
is allocating the required storage for the private data behind the scenes and simply returning that pointer, iv_destroy()
won’t be able to verify whether a pointer is valid or not (unless it keeps track of all returned pointers as if they were handles; or perhaps the allocator provides such functionality). That is, even if a pointer is non-NULL
, it is not a guarantee of validity or dereferenceability.
Therefore, while new APIs can be designed to be safe and marked as such, legacy APIs may not easily be tagged as [[safe]]
even changing their implemention. However, some of them might be close to “safe” and a developer might be tempted to mark them as [[safe]]
nevertheless, if only for the binding generation ergonomics. For instance, the opaque-to-pointer approach may be “safe” except for non-NULL
invalid pointers. That is, as long as the user of such a library never passes user-crafted or manipulated pointers, the functions will be “safe”. A particular project may want to use such a weaker definition of “safety”. Similarly, a developer could consider third()
“good enough” since the pointer and the size are “checked”.
While these two examples are not aligned with our definition of safety, they may be considered reasonable in some C codebases where the checking already amounts to that or where there are other constraints that make it hard to achieve. Thus a possible extension to this proposal could provide a safe_unsound
attribute for that use case to dissuade misuse of the safe
attribute, i.e., annotation of functions designed to be “as safe as possible” (but requiring extra assumptions, thus ultimately unsound by our definition). While this attribute may be controversial, there are already FFI libraries and binding generators that automate the maintenance of unsound bindings to C and C++ functions (such as cxx
and autocxx
[30][31] for Rust), and there are key projects using or experimenting with them (such as Chromium [32]). For instance, such binding generators could provide a strict mode that would only allow generation of bindings for symbols explicitly marked as [[safe_unsound]]
or [[safe]]
. This would enable a project to incrementally expand its bindings surface, requiring a manual review to provide new ones. Developers can take the chance of using those reviews to write down the assumptions that a [[safe_unsound]]
may be making or, if possible, make them [[safe]]
.
Another possible extension is to attempt to tackle certain issues to make it easier to tag existing APIs as [[safe]]
. For instance, a safe_deref
attribute could annotate pointer parameters assumed to be valid/dereferenceable for the purposes of soundness of the safe
attribute (i.e., with similar semantics as existing IR attributes such as LLVM’s dereferenceable
[33]). For instance:
struct Opaque;
typedef struct Opaque * IntVector;
[[safe]] IntVector iv_create(size_t size);
[[safe]] void iv_destroy(IntVector [[safe_deref]] iv);
Here, iv_destroy()
would only claim to be sound if iv
is not only non-NULL
, but also valid to dereference. Note that users may still craft a valid, dereferenceable pointer that has not been created through iv_create()
, so the library would still need to ensure such usage remains safe. Therefore, an attribute such as safe_deref
may prove more useful for pointer parameters that have never been seen before, e.g.:
[[safe]] Result apply_n(
EntityHandle handle,
const ComplexOperation * [[safe_deref]] operation,
size_t n_times
);
Nevertheless, this proposal focuses on the safe
attribute only and the concept of a “safe function”, since those are the core pieces to get into the standard first: they are relatively easy to understand, with similar concepts already existing in other languages. Further proposals can be made on top of this one.
Finally, a note on soundness: if a program triggers undefined behavior at any point in time, then safe functions are not expected to remain sound. Consider the following complete translation unit:
static int i = 0;
static int * p = &i;
[[safe]] int f(void) {
return *p;
}
Since p
is not modified anywhere else, we can conclude f()
is a safe function, and therefore its [[safe]]
marking is sound. However, in a real machine implementing a common computer architecture, and assuming *p
hasn’t been optimized away (i.e., the address is still loaded from memory), it is possible that the value of the pointer gets overwritten by other code triggering undefined behavior. In such case, while f()
may lead to undefined behavior itself, we still consider it sound.
Similarly, in computers with multitasking operating systems, memory may have been modified by other processes, even if it is in a separate address space. Furthermore, soft errors, single-event effects, hardware design bugs, hardware failures, etc. are all possibilities when dealing with real systems. All these factors do not change the soundness of a function marked as [[safe]]
.
bindgen
To showcase the safe
attribute in cross-language scenarios, an example trivial implementation has been patched on top of both LLVM/Clang [34] and bindgen
[35] (the popular C to Rust bindings generator).
Given a C function marked as [[safe]]
:
[[safe]] int f(void) {
return 42;
}
The bindgen
tool uses libclang
from LLVM to parse the C code. It will generate the following Rust binding:
pub fn f() -> c_int;
That is, it appears to Rust clients as a safe function. Its implementation takes care of calling the extern "C"
function which is unsafe
.
Instead, without the [[safe]]
marking in the C function, the following interface would have been generated, as it is usually the case:
extern "C" {
pub fn f() -> c_int;
}
This one forces Rust clients to write the safe wrapper themselves (or write an unsafe
block every time they call it), making the C libraries harder to use in that language.
Implementing support for the safe
attribute without diagnostics should be a trivial effort for most implementations, as shown in the example implementation above. Generating useful diagnostics, in particular without too many false positives, is the hard part of the proposal. Nevertheless, it is an optional part: implementations may choose to not generate any diagnostics, even for trivial functions.
Some modern C toolchains already have the ability to detect undefined behavior in some cases (e.g., at runtime by their sanitizers and at compile-time for those compilers that implement C++’s constexpr
), which they might be able to reuse. Implementations may also decide to leave diagnostics up to specialized static analyzers.
The proposed wording is with respect to the N2596 C23 Working Draft.
After “3.18 runtime-constraint”, add:
3.19 safe function
A function that does not lead to undefined behavior along any path of execution.
In “6.7.11.1 General”, modify paragraph 2:
The identifier in a standard attribute shall be one of:
deprecated
fallthrough
maybe_unused
nodiscard
safe
After “6.7.11.5 The fallthrough
attribute”, add:
6.7.11.6 The
safe
attributeConstraint
The
safe
attribute shall be applied to the identifier in a function declaration. No attribute argument clause shall be present.Semantics
The
safe
attribute can be used to mark functions designed to be safe.The
__has_c_attribute
conditional inclusion expression (6.10.1) shall return the value 2023XXL when givensafe
as the pp-tokens operand.Recommended Practice
Implementations might use the
safe
attribute to produce a diagnostic message if the program is detected to contain undefined behavior in some execution path.EXAMPLE 1
[[safe]] int f(int a, int b) { return a / b; }
An unsafe function marked as
[[safe]]
: implementations are encouraged to diagnose that the/
operation may trigger undefined behavior (in particular, when the function is called withb == 0
or witha == INT_MIN && b == -1
).EXAMPLE 2
[[safe]] int f(int a, int b) { if (b == 0) abort(); if (a == INT_MIN && b == -1) abort(); return a / b; }
A safe function: implementations are discouraged to produce a diagnostic message.
EXAMPLE 3
static int * p; [[safe]] int f(void) { return *p; } // other functions that may modify the pointer // in the same translation unit
A possibly unsafe function: implementations are discouraged to diagnose it unless they can prove there is a path in the program containing undefined behavior (i.e., false positives are discouraged).
secure_clear
(update to N2599)” — http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2631.htmUnsafePointer
” — https://developer.apple.com/documentation/swift/unsafepointerbindgen
Contributors. “bindgen
— automatically generates Rust FFI bindings to C (and some C++) libraries” — https://github.com/rust-lang/rust-bindgencbindgen
Contributors. “cbindgen
— creates C/C++11 headers for Rust libraries which expose a public C API” — https://github.com/eqrion/cbindgencxx
Contributors. “CXX — safe FFI between Rust and C++” — https://github.com/dtolnay/cxxautocxx
Contributors. “Autocxx — A tool for calling C++ from Rust in a heavily automated, but safe, fashion” — https://github.com/google/autocxxdereferenceable
Metadata” — https://llvm.org/docs/LangRef.html#dereferenceable-metadata[[safe]]
attribute support” — https://github.com/ojeda/llvm-project/releases/tag/N2659[[safe]]
attribute support” — https://github.com/ojeda/rust-bindgen/releases/tag/N2659