Reply-to: David Goldblatt (davidtgoldblatt@gmail.com, davidgoldblatt@fb.com)
Document no: WG14 / N2293
Date: 2018-09-16
The alignment requirements of malloc
, calloc
, and realloc
are somewhat confusingly phrased, in a way that affects small allocations (sizes less than _Alignof(max_align_t)
). Some readers (and implementations) interpret them to demand _Alignof(max_align_t)
-alignment even for allocation sizes that could not hold an object with that alignment. We call this the "strong-alignment" reading. Other readers (and implementations) interpret them as requiring the returned memory to be aligned only enough to accommodate those types that could inhabit the returned memory. In particular, since sizeof(T) >= _Alignof(T)
for all portably defined types T
, allocations with sizes smaller than _Alignof(max_align_t)
need only be aligned to the largest power of two less than or equal to the requested size. We call this the "weak-alignment" reading.
Many implementations only provide weak-alignment guarantees, and some cases these guarantees are important in conserving both CPU and memory. Therefore, we propose clarifying the wording such that the weak-alignment implementation is unambiguously allowed. Strong-alignment implementations remain correct, and may document the guarantee as an extension.
C17, 7.22.3 states: "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated".
The two clauses of interest here are "it may be assigned to a pointer to any type of object with a fundamental alignment requirement" and "and then used to access such an object or an array of such objects in the space allocated". Strong-alignment readers tend to see both clauses as independent, whereas weak-alignment readers tend to see the second as an adverbial clause modifying the first. Neither reading is completely satisfactory. The strong-alignment view implies an independent clause with a tense mismatch or a missing verb; the weak-alignment view uses "and then" when "then" would suffice, which could imply a sequencing at odds with the interpretation.
In the author's view, attempting to declare one of these two parsings correct is a waste of time. Many different implementers, fluent in English and acting in good faith, have parsed the sentence both ways. The question is how to reconcile the divergence.
The author has experimented with and inspected the source code for as many different implementations as possible. These are his best guesses based on consultations with maintainers and the observed behavior of implementations (if possible), or source-code inspection alone (if not).
Strong-alignment environments:
Weak-alignment environments:
I also reached out to those maintainers with whom I could find a mutual professional connection:
(There's no filtering of responses here; every group of maintainers I was able to contact is included, except for one that was unable to obtain legal permission to comment publicly).
This is not the first time this vagueness has been addressed; the response to DR75 endorses the strong-alignment interpretation. However, that response never become normative (via inclusion in a technical corrigendum or international standard). Moreover, it was written before the introduction of the notion of fundamental alignments, and thereby implies that all returned pointers must be suitably aligned for all types; including vector types, types with implementation-specific alignment specifiers, etc. Most compilers with which the author is familiar never provided such a strong guarantee in between DR75 and C11.
We propose rewording so that the weak-alignment interpretation is specified, and replacing:
"The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated"
with
"The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and size less than or equal to the size requested. It may then be used to access such an object or an array of such objects in the space allocated"
This makes all the implementations listed in the previous section conforming.
In subsequent sections, we'll argue that this change does not induce code breakage, and that changing the weak-alignment implementations to be strong-alignment ones would cause significant performance regressions.
A strong-alignment reader might argue that the proposed rewording breaks existing code; alignment assumptions that they read as holding in all conforming implementations now only hold given an implementation-level extension. However, even granting their parsing of the wording, this guarantee was only ever true in theory, not in practice. The number of weak-alignment implementations meant that the code in question was never meaningfully portable except to those implementations that opted-in in the first place.
Nevertheless, in an attempt to quantify the amount of such non-portable code, I examined the FreeBSD ports
project. Many (over 30,000, though not all of them written in C) third-party programs are available for installation on FreeBSD systems. Because many of these programs are developed primarily on other operating systems, the ports project includes the patches required to allow projects to work on FreeBSD. If a program uses the strong-alignment assumption, we would expect to see a patch involving malloc
, realloc
, calloc
, or max_align_t
in the ports project, tweaking the size argument (since FreeBSD's libc malloc is weak-alignment). Manual inspection of all instances of these strings in these patches indicated that this was not the case.
However, it may still be useful to consider a few specific scenarios:
max_align_t
trying to find instances that might rely on alignment in this way. In case this assumption arrived through other means (such as using _Alignof(some_other_primitive_type)
), I also searched for related phrases, like "tag bits", "pointer tag", etc. In all cases (all language runtimes or mutex implementations), the required alignment was otherwise implied by the contents of the pointed to object (either because of an explicit _Alignas
, the presence of a sufficiently-aligned data member, or merely the size of the pointed-to object).S*
is cast to a T*
(e.g. in a pointer-to-first-element cast) and then back, is still correct in a weak-alignment implementation. Additionally, Facebook has a large codebase containing both C and C++ (and both proprietary and open-source code), which it tests in build modes containing dynamic alignment checks. No breakages due to weak malloc alignment have been reported to the malloc team.typedef __attribute__((aligned(16))) char aligned_char_t
, _Alignof(aligned_char_t)
is 16 while sizeof(aligned_char_t)
is 1 on GCC). These types already violate language rules (for example, in array indexing), and, being non-standard, don't need to have standards-defined semantics. I also searched for variants of this pattern in Debian packages. Unfortunately, the number of results (over 9,000) was too large to examine every occurrence manually; however, in every case I did examine (a few hundred), no breakage would occur (most instances are static data, intended to loosen rather than strengthen alignment, or applied only to large objects, a la typedef float vec4[4] __attribute__ ((__aligned__(16)));
).ptr = malloc(num_objects * object_size)
, with num_objects
set to 0 (i.e. equivalent to malloc(0)
). It might subsequently rely on the alignment guarantee it receives. Instances of this pattern are difficult to quantify, and it's enough of an edge case that the author was not motivated to try. If such code is deemed valuable, then specifying that the pointer returned from malloc(0)
, if non-null, should be _Alignof(max_align_t)
-aligned is straightforward to require.In summary: code that requires strong-alignment assumptions for correctness is rare enough that the above search was unable to find it. Any such code that does exist relies on platform-dependent behavior in practice if not in theory, and therefore is not any more or less portable with the proposed rewording than in the status quo.
Another way of dealing with the split would be to clarify in the strong-alignment direction, and demand that the weak-alignment implementations change their behavior. However, doing so would impose nontrivial costs on those implementations. The weak-alignment interpretation has a number of performance advantages over the strong-alignment one.
2 * _Alignof(max_align_t)
._Alignof(max_align_t)
to use the same mutex increases contention on it.Strong-alignment implementations are also valid weak-alignment implementations, and therefore an implementation that obtains performance improvements from a strong-alignment guarantee can of course continue to provide them.
Note that most compiler-level optimizations don't benefit from assuming strong-alignment. The optimizer cannot in general assume that a pointer it sees came from malloc
unless it sees it returned from malloc
or passed to free
(and, in the former case, it's free to expand the size of the allocation if desired). We are concerned only with small allocations here, so having a small alignment cost multiplied by iteration through a loop is not a concern. Moreover, situations in which an operation on a N/2-byte object can be sped up when it is known that the object is N-byte aligned are relatively uncommon.
To attempt to quantify the possible regression, I measured the performance changes on an allocator that can be configured to provide either the strong or the weak alignment guarantees (jemalloc), on two large (heaps in the 10s of GBs) proprietary server processes. Each showed a 1.1% regression in resident set size, and 0.7% and 0.8% regressions in CPU consumption.
To get numbers based only on publicly available data, I measured GCC 4.8.5's runtime and memory consumption when compiling GCC 4.0.0's sources concatenated into a single file (the former version being chosen because it is the default compiler provided by my operating system, and the latter because of its easy availability at https://people.csail.mit.edu/smcc/projects/single-file-programs/), and measured runtime and memory consumption. These numbers are for three runs, after an initial “burn-in” run for both versions:
Comparing best-to-best run in each category gives a 2.5% runtime increase and 1.7% RSS increase from enforcing strong alignment. Median-to-Median gives a 4.3% runtime increase and 2.0% RSS increase. Worst-to-worst gives a 4.2% runtime increase and 0.04% RSS increase.
Regardless of whether or not strong-alignment is required in the Platonic reading of C as standardized, it is not provided in C as implemented. Clarifying the wording in the strong-alignment direction imposes performance regressions on existing implementations. Leaving the wording unchanged confuses users and implementers. Clarifying the wording in the weak-alignment direction avoids these costs, and does not change the portability of programs in practice.