intptr_t
and back?==========================================================
Submitter: Kayvan Memarian and Peter Sewell
Submission Date: 2016-09-22
Document: WG14 N2090
Related: Section 4 of N2012, Questions 3/15, 4/15, and 5/15 of our survey, Section 2.1-2.9 (Q1-20) of our N2013, and DR260
This document is based on N2012 (Section 4), adding a concrete Technical Corrigendum proposal for discussion, revising the text, and adding concrete examples (from N2013).
C pointer values could traditionally be considered to be concrete numeric values (our survey, Questions 3, 4, and 5, indicates many still do). The DR260 Committee Response suggests otherwise, hinting at a notion of provenance associated to values that keeps track of their "origins":
"Implementations are permitted to track the origins of a bit-pattern and treat those representing an indeterminate value as distinct from those representing a determined value. They may also treat pointers based on different origins as distinct even though they are bitwise identical."
Current compilers appear to exploit this, using it to justify alias analysis based on provenance distinctions. However, the DR260 CR has not been incorporated in the standard text, and it also leaves many specific questions unclear: it's ambiguous whether some idioms are allowed or not, and unclear what alias analysis and optimisation is allowed to do.
In this note we go through a sequence of specific questions, supported by concrete examples, and make a candidate proposal for discussion.
The basic idea is to associate a provenance with every pointer value, essentially identifying the original allocation the pointer is derived from. This is for the "C abstract machine" as defined in the standard: compilers might rely on provenance in their alias analysis and optimisation, but one would not expect normal implementations to record or manipulate provenance at runtime (though dynamic or static analysis tools might). Accordingly, provenances do not have any representation.
Pointer values and integer values both carry a provenance, either the "empty" provenance, a single provenance ID, or the "wildcard" provenance.
On every allocation (of objects with static, thread, automatic, and allocated storage duration), the abstract machine nondeterministically choose a fresh provenance ID (unique in the entire execution), and the resulting pointer value carries that single provenance ID.
At any access via a pointer value, its numeric address must be consistent with its provenance, with undefined behaviour otherwise. In particular:
access via a pointer value with empty provenance is undefined behaviour (except where the numeric value is within an implementation-defined set of "device" memory addresses);
access via a pointer value with a single provenance ID must be within the memory footprint of the corresponding original allocation;
access via a pointer value with wildcard provenance must be within some currently live object.
This undefined behaviour is what justifies optimisation based on provenance-based alias analysis.
Then there are many specific choices of how provenance is affected by arithmetic operations and suchlike. We first discuss the questions and then summarise our proposal.
Note that provenance-based alias analysis should not be confused with type-based alias analysis, and it is (contrary to the expectations of some expert C users and OS developers) still observable in current mainstream implementations even with -no-strict-aliasing
, opening the door to subtle bugs. Compilers should probably provide an option to turn it off, and any proposal for "safe C" should mandate and describe that option.
Here DR260CR clearly says yes. Our experimental data shows cases where recent versions of GCC and ICC do assume non-aliasing of pointers with identical representation values but distinct provenance. This is incompatible with a concrete semantics of pointers (where they are fully characterised by their representation values). Tracking of provenance in the "abstract machine" is therefore clearly necessary to make these compilers sound with respect to the standard.
For example, consider the following pathological code (adapted from DR260).
Example: provenance_basic_global_yx.c
#include <stdio.h>
#include <string.h>
int y=2, x=1;
int main() {
int *p = &x + 1;
int *q = &y;
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // does this have undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
}
return 0;
}
Depending on the implementation, x
and y
might happen to be allocated in adjacent memory, in which case &x+1
and &y
will have bitwise-identical representation values, the memcmp
will succeed, and p
(derived from a pointer to x
) will have the same representation value as a pointer to a different object, y
, at the point of the update *p=11
. This can occur in practice, e.g. with GCC -O2. Its output of
x=1 y=2 *p=11 *q=2
suggests that the compiler is reasoning that *p
does not alias with y
or *q
, and hence that the initial value of y=2
can be propagated to the final printf
.
This outcome would not be correct with respect to a naive concrete semantics, and so to make the compiler sound it is necessary for this program to be deemed to have undefined behaviour. Note that this example does not involve type-based alias analysis, and the outcome is not affected by GCC's -fno-strict-aliasing
flag. One might ask whether the mere formation of the pointer x+1
is legal. This case is explicitly permitted by the ISO standard.
Example: provenance_equality_global_yx.c
#include <stdio.h>
#include <string.h>
int y=2, x=1;
int main() {
int *p = &x + 1;
int *q = &y;
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
_Bool b = (p==q);
// can this be false even with identical addresses?
printf("(p==q) = %s\n", b?"true":"false");
return 0;
}
This is also allowed according to DR260CR. We have observed GCC regarding two pointers with different provenance as nonequal (with ==
) even though they have the same representation value. This happens in some circumstances but not others, so we suggest that whether pointer equality takes provenance into account or not should be made indeterminate in the standard (again to make the observed compiler behaviour sound with respect to the standard). Note that requiring equality to always take provenance into account would require implementations to track provenance at runtime.
The ISO C11 standard text is too strong here: 6.5.9p6 says "Two pointers compare equal if and only if both are [...] or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space", which requires such pointers to compare equal (reasonable pre-DR260CR, not not after it). We don't expect programmers to rely on that behaviour and GCC does not satisfy it, so, to be consistent with DR260CR and with the indeterminate behaviour we suggest, it should permit them to compare either equal or non-equal.
intptr_t
and back?ISO C11 optionally allows implementations to provide the type intptr_t
(along with an unsigned variant) with guaranteed round-trip properties for pointer/integer casts. The following should be allowed, and that means that the C abstract machine should track provenance via such casts to and from integer values.
Example: provenance_roundtrip_via_intptr_t.c
#include <stdio.h>
#include <inttypes.h>
int x=1;
int main() {
int *p = &x;
intptr_t i = (intptr_t)p;
int *q = (int *)i;
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q);
}
It also seems to be common practice (e.g. in Linux) to extend these properties to unsigned long, as in the example below, when its implementation is large enough. We suggest that this be permitted iff that is the case, or that it be implementation-defined which integer types support this.
Example: provenance_roundtrip_via_unsigned_long.c
#include <stdio.h>
int x=1;
int main() {
int *p = &x;
unsigned long i = (unsigned long)p;
int *q = (int *)i;
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q);
}
Example: provenance_basic_using_intptr_t_global_yx.c
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
int y = 2, x = 1;
int main() {
intptr_t ux = (intptr_t)&x;
intptr_t uy = (intptr_t)&y;
intptr_t offset = 4;
int *p = (int *)(ux + offset);
int *q = &y;
printf("Addresses: &x=%"PRIiPTR" p=%p &y=%"PRIiPTR\
"\n",ux,(void*)p,uy);
if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // does this have undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
}
}
Given the type intptr_t
, this question asks whether one can return to a concrete view of pointers as simple numbers, by casting to intptr_t
followed by integer arithmetic and casting back to a pointer type. Here again, we observe GCC behaving the same as with Q1, reasoning that pointers obtained in this way cannot alias even if they have the same numerical values. This observation is reinforced by the GCC documentation, which mentions an "original pointer" associated to integer values cast to pointer type, so the answer seems to be "yes". This leads to many more questions regarding the specifics of how provenance information affect the semantics of each integer operator. Some of these are discussed in the next subsection and the remainder are given a complete treatment in the summary of our proposal at the end.
### Q6 Can one use bit manipulation and integer casts to store information in unused bits of pointers?
Now we extend the example of Q3, that cast a pointer to intptr_t
and back, to use logical operations on the integer value to store some tag bits. The following code exhibits a strong form of this, storing the address and tag bit combination as a pointer (which thereby creates a misaligned pointer value, though one not used for accesses); a weaker form would store the combined value only as an integer.
Example: provenance_tag_bits_via_uintptr_t_1.c
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
int x=1;
int main() {
int *p = &x;
// cast &x to an integer
uintptr_t i = (uintptr_t) p;
// check the bottom two bits of an int* are not used
assert(_Alignof(int) >= 4);
assert((i & 3u) == 0u);
// construct an integer like &x with low-order bit set
i = i | 1u;
// cast back to a pointer
int *q = (int *) i; // defined behaviour?
// cast to integer and mask out the low-order two bits
uintptr_t j = ((uintptr_t)q) & ~((uintptr_t)3u);
// cast back to a pointer
int *r = (int *) j;
// are r and p now equivalent?
*r = 11; // defined behaviour?
_Bool b = (r==p);
printf("x=%i *r=%i (r==p)=%s\n",x,*r,b?"true":"false");
}
The standard leaves conversions between integer and pointer types implementation-defined (6.3.2.3p{5,6}), but it is common practice to use unused pointer bits (either low-order bits from alignment requirements or high-order bits beyond the maximum address range). We suggest that the set of unused bits for pointer types of each alignment should be made implementation-defined, to make this practice legal.
Moreover, where the standard does give a guarantee, e.g. for round-trips through intptr_t (7.20.1.4p1), it says only that the result "will compare equal". In a provenance-aware semantics, that may not be enough to make the result usable to reference memory; the standard text should be strengthened here to guarantee that by giving the result a usable provenance.
Example: provenance_equality_uintptr_t_global_yx.c
#include <stdio.h>
#include <inttypes.h>
int y=2, x=1;
int main() {
uintptr_t p = (uintptr_t)(&x + 1);
uintptr_t q = (uintptr_t)&y;
printf("Addresses: p=%" PRIxPTR " q=%" PRIxPTR "\n",
p,q);
_Bool b = (p==q);
// can this be false even with identical numeric addresses?
printf("(p==q) = %s\n", b?"true":"false");
return 0;
}
GCC did at one point print false for this, but it was regarded as a bug and fixed. We have observed in Clang for a similar example, but believe it is also a bug there. DR 260CR does not address the question. We believe that integer equality testing should not be affected by provenance, i.e. "no".
This example is inspired by one from Krebbers' PhD thesis.
DR260CR does not address this, but it is uncontroversially "yes": an intra-object pointer subtraction, say between the addresses of two elements of an array, should give a provenance-free integer offset that can then be used for indexing into this or other arrays.
Example: provenance_multiple_2_global.c #include
We say a provenance-free integer offset because the fact that an offset is calculated from pointers to y
should not, when added to a pointer to a distinct (maybe adjacent) object x
, license its use to access y
. The example below should not be allowed to access y[0]
, and we observe GCC optimising based on that assumption.
Example: provenance_multiple_4_global_yx.c
#include <stdio.h>
#include <string.h>
int y[2], x[2];
int main() {
int *p = &x[1] + (&y[1]-&y[0]) + 0;
int *q = &y[0];
printf("Addresses: p=%p q=%p\n",(void*)p,(void*)q);
if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // does this have undefined behaviour?
printf("y[0]=%d *p=%d *q=%d\n",y[0],*p,*q);
}
return 0;
}
This is asking about pointers that have multiple provenances, which is not addressed in DR260CR or current GCC or Clang compiler documentation - they refer to "the origin" of a pointer as if there were necessarily only one.
The example below is a variant of the Q5 provenance_basic_using_intptr_t_global_yx.c
in which the constant offset is replaced by a subtraction (here after casting from pointer to integer type).
Example: pointer_offset_from_subtraction_1_global.c
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
int y = 2, x=1;
int main() {
intptr_t ux = (intptr_t)&x;
intptr_t uy = (intptr_t)&y;
intptr_t offset = uy - ux;
printf("Addresses: &x=%"PRIiPTR" &y=%"PRIiPTR\
" offset=%"PRIiPTR" \n",ux,uy,offset);
int *p = (int *)(ux + offset);
int *q = &y;
if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // is this free of undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
}
}
Our experiments and our survey responses both suggest that compilers do not in general support this, and we imagine it is uncommon in practice. However, there do seem to be specific important use cases, including the Linux and FreeBSD per-CPU variable implementations - though it is unclear whether these are between multiple allocations in the C sense. They might be dealt with by an attribute such as the GCC may_alias
- though the documentation for that refers only to type-based alias analysis, not to "provenance-based" alias analysis. This needs further discussion, but we tentatively suggest "no".
The classic XOR linked list algorithm (implementing a doubly linked list with only one pointer per node, by storing the XOR of two pointers) also makes essential use of multiple-provenance pointers. In this example we XOR the integer values from two pointers and XOR the result again with one of them.
Example: pointer_offset_xor_global.c
#include <stdio.h>
#include <inttypes.h>
int x=1;
int y=2;
int main() {
int *p = &x;
int *q = &y;
uintptr_t i = (uintptr_t) p;
uintptr_t j = (uintptr_t) q;
uintptr_t k = i ^ j;
uintptr_t l = k ^ i;
int *r = (int *)l;
// are r and q now equivalent?
*r = 11; // does this have defined behaviour?
_Bool b = (r==q);
printf("x=%i y=%i *r=%i (r==p)=%s\n",x,y,*r,
b?"true":"false");
}
It is unclear whether this algorithm is important in modern practice. One respondent remarks that the XOR list implementation interacts badly with modern pipelines and the space saving is not a big win; another doesn't know of modern usages, though suspects that it is probably still used in places. We don't know whether current compiler alias analysis permits it or not. Our suggested semantics would not allow it.
Example: pointer_copy_memcpy.c
#include <stdio.h>
#include <string.h>
int x=1;
int main() {
int *p = &x;
int *q;
memcpy (&q, &p, sizeof p);
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q);
}
The ISO C11 text does not explicitly address this. In a pre-provenance semantics, before DR260, it did not need to, but now (as it surely should be allowed) the standard needs to guarantee that the result has the appropriate provenance to make it usable. One could allow it by special-casing memcpy()
to preserve provenance, but the following questions suggest a less ad hoc approach.
Example: pointer_copy_user_dataflow_direct_bytewise.c
#include <stdio.h>
#include <string.h>
int x=1;
void user_memcpy(unsigned char* dest,
unsigned char *src, size_t n) {
while (n > 0) {
*dest = *src;
src += 1;
dest += 1;
n -= 1;
}
}
int main() {
int *p = &x;
int *q;
user_memcpy((unsigned char*)&q, (unsigned char*)&p,
sizeof(p));
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q);
}
ISO C11 and DR260CR again do not mention this explicitly (though the 6.5p6 effective type text weakly implies it is allowed). We believe it is widely relied on.
(The only exceptions we are aware of are capability machines such as IBM system 38 and descendents, or CHERI. In CHERI you have to copy pointers at pointer types for it to work properly, but capability loads and stores can operate generically, because the capability registers have tag bits. There is also some new tagged memory support for Oracle Sparc, to find invalid pointers.)
Our proposed semantics makes it legal by regarding each representation byte (as an integer value) as having the provenance of the original pointer, and the result pointer, being composed of representation bytes with that provenance, as having the same. We could either insist:
that all writes that a pointer read reads from have either the same provenance or an empty provenance, or
even more restrictively, one could insist that one has all the original bytes of some legitimate pointer.
The former is our preferred option. There may not be much reasonable code that would be sensitive to the distinctions between these - perhaps manipulations of pointers where one knows the high-order bytes are common, as in the survey response mentioning encoding 64-bit pointers in 48 bits. Our semantics will permit that.
For example, suppose one reads the bytes of a pointer representation pointing to some object, encrypts them, decrypts them, store them as the representation of another pointer value, and tries to access the object. The following code is a simplified version of this, just using a XOR twice; one should imagine a more complex transform, with the transform and its inverse separated in the code and in time so that the compiler cannot analyse them.
Example: pointer_copy_user_dataflow_indirect_bytewise.c
#include <stdio.h>
#include <string.h>
int x=1;
void user_memcpy2(unsigned char* dest,
unsigned char *src, size_t n) {
while (n > 0) {
*dest = ((*src) ^ 1) ^ 1;
src += 1;
dest += 1;
n -= 1;
}
}
int main() {
int *p = &x;
int *q;
user_memcpy2((unsigned char*)&q, (unsigned char*)&p,
sizeof(p));
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q);
}
Whether this should be supported is unclear, and DR260 CR does not speak to it
(it calls out the library memcpy
and memmove
as special cases: "Note that using assignment or bitwise copying via memcpy
or memmove
of a determinate value makes the destination acquire the same determinate value."). Our proposal would allow it only in the case where bytes of different pointer values are not mixed during the computation.
Our provenance examples so far have all only involved dataflow; we also have to ask if a usable pointer can be constructed via non-dataflow control-flow paths, e.g. if testing equality of an unprovenanced integer value against a valid pointer permits the integer to be used as if it had the same provenance as the pointer. We don't expect that this is relied on in practice, and our proposed semantics does not permit it - we track provenance only through dataflow. This needs to be discussed with respect to current compiler analysis behaviour.
For example, consider a version of the previous indirect memcpy
example with a control-flow choice on the value of the bytes:
Example: pointer_copy_user_ctrlflow_bytewise_abbrev.c
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <limits.h>
int x=1;
unsigned char control_flow_copy(unsigned char c) {
assert(UCHAR_MAX==255);
switch (c) {
case 0: return(0);
case 1: return(1);
case 2: return(2);
...
case 255: return(255);
}
}
void user_memcpy2(unsigned char* dest,
unsigned char *src, size_t n) {
while (n > 0) {
*dest = control_flow_copy(*src);
src += 1;
dest += 1;
n -= 1;
}
}
int main() {
int *p = &x;
int *q;
user_memcpy2((unsigned char*)&q, (unsigned char*)&p,
sizeof(p));
*q = 11; // is this free of undefined behaviour?
printf("*p=%d *q=%d\n",*p,*q);
}
Similarly, one can imagine copying a pointer via uintptr_t
bit-by-bit via a control-flow choice for each bit.
Finally, contrasting with the first two examples above, that recover all the concrete value information of the original pointer, we can consider a variant of the Q5 provenance_basic_using_intptr_t_global_yx.c
example in which there is a control-flow choice based on partial information of the intended target pointer (here just whether q
is null) and the concrete value information is obtained otherwise:
Example: provenance_basic_mixed_global_offset+4.c
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
int y = 2, x=1;
int main() {
intptr_t ux = (intptr_t)&x;
intptr_t uy = (intptr_t)&y;
intptr_t offset = 4;
printf("Addresses: &x=%"PRIiPTR" &y=%"PRIiPTR\
"\n",ux,uy);
int *q = &y;
if (q != NULL) {
int *p = (int *)(ux + offset);
if (memcmp(&p, &q, sizeof(p)) == 0) {
*p = 11; // is this free of undefined behaviour?
printf("x=%d y=%d *p=%d *q=%d\n",x,y,*p,*q);
}
}
}
A semantics that tracks provenance only through dataflow dependency seems to be the simplest option and probably compatible with programming practice; we imagine that none of these idioms occur in normal practice. It would forbid the above examples, while permitting the dataflow bitwise copy example.
This is our preferred option.
Allowing provenance to be propagated via any control-flow dependency would allow all these examples, but it seems clear that the last example should be forbidden, in ISO or de facto semantics, and indeed GCC is again doing an optimisation that would not be sound if it were. In real code we imagine that many pointer accesses are in some way control-flow dependent on others, given the many null-pointer checks required in C, so tracking that would neither be feasible nor useful.
The following example (analogous to the provenance_roundtrip_via_intptr_t.c
of Q3) constructs a pointer by casting a pointer to uintptr_t
, storing that in a member of a union of that type, and then reading from a member of the union of pointer type.
Example: provenance_union_punning_1_global.c
#include <stdio.h>
#include <string.h>
#include <inttypes.h>
int x=1;
typedef union { uintptr_t ui; int *p; } un;
int main() {
un u;
int *px = &x;
uintptr_t i = (uintptr_t)px;
u.ui = i;
int *p = u.p;
printf("Addresses: p=%p &x=%p\n",(void*)p,(void*)&x);
*p = 11; // is this free of undefined behaviour?
printf("x=%d *p=%d\n",x,*p);
return 0;
}
The ISO standard says "the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type", but says little about that reinterpretation, and DR260 CR does not speak to the provenance of the result.
For any particular implementation, pointers to normal object types might or might not have the same representation as uintptr_t
values. That might well not hold for some implementations, but it does for our usual ones.
Our semantics has to have an implementation-defined map for the conversions between pointer representations and uintptr_t
in any case, so we can say that this example is allowed, preserving the original provenance, iff that is the identity.
That choice relies on an assumption that compiler alias analysis and optimisation are not assuming that this example is undefined behaviour. At present we have no data either way about that.
We now consider the extreme example of pointer provenance flowing via IO, if one writes the address of an object to a file and reads it back in. We have three versions: one using fprintf
/fscanf
and the %p
format, one using fwrite
/fread
on the pointer representation bytes, and one converting the pointer to and from uintptr\_t
and using fprintf
/fscanf
on that value with the PRIuPTR
/SCNuPTR
formats.
The first gives a syntactic indication of a potentially escaping pointer value, while the others (after preprocessing) do not. Giving just the first in full:
Example: provenance_via_io_percentp_global.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
int x=1;
int main() {
int *p = &x;
FILE *f = fopen(
"provenance_via_io_percentp_global.tmp","w+b");
printf("Addresses: p=%p\n",(void*)p);
// print pointer address to a file
fprintf(f,"%p\n",(void*)p);
rewind(f);
void *rv;
int n = fscanf(f,"%p\n",&rv);
int *r = (int *)rv;
if (n != 1) exit(EXIT_FAILURE);
printf("Addresses: r=%p\n",(void*)r);
// are r and p now equivalent?
*r=12; // is this free of undefined behaviour?
_Bool b1 = (r==p); // do they compare equal?
_Bool b2 = (0==memcmp(&r,&p,sizeof(r)));//same reps?
printf("x=%i *r=%i b1=%s b2=%s\n",x,*r,
b1?"true":"false",b2?"true":"false");
}
This is used in practice: in graphics code for marshalling/unmarshalling, using %p
, in xlib, using SCNuPTR
, and in debuggers.
In the ISO standard, the standard text for fprintf
and scanf
for %p
say that this should work: "If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to that value; otherwise the behavior of the %p
conversion is undefined" (modulo the usual remarks about "compare equal"), and the text for uintptr_t
and the presence of SCNuPTR
in inttypes.h
implies the same there.
To make the standard allow it in a provenance-aware C abstract machine, we suggest that either
the pointers output during an execution should be recorded (in the abstract machine, not in normal implementations) along with their provenance, in order to be reinjected when these representation value are input later during the execution, or
such reads give pointers with a "wildcard" provenance, allowing them to be used to access any current allocation with the same concrete address and type (this would prohibit provenance-based alias analysis of such pointers).
After a successful usage of a pointer with wildcard provenance, one could conceivably side-effect the pointer value to collapse its provenance down to the one used. We have no idea whether compilers could or do depend on that; absent a compelling reason otherwise, we do not plan to build that in to our semantics.
C programs should normally not form pointers from particular concrete addresses. For example, the following should normally be considered to have undefined behaviour, as address 0xABC
might not be mapped or, if it is, might alias with other data used by the runtime. By the ISO standard it does have undefined behaviour, consistent with an abstract view of pointers.
Example: pointer_from_concrete_address_1.c
int main() {
// on systems where 0xABC is not a legal non-stack/heap
// address, does this have undefined behaviour?
*((int *)0xABC) = 123;
}
But in some circumstances, especially for embedded devices, it is idiomatic to use concrete addresses in C to access memory-mapped devices, e.g.
Example: pointer_from_concrete_address_2.c
#define PORTBASE 0x40000000
unsigned int volatile * const port =
(unsigned int *) PORTBASE;
int main() {
unsigned int value = 0;
// on systems where PORTBASE is a legal non-stack/heap
// address, does this have defined behaviour?
*port = value; /* write to port */
value = *port; /* read from port */
}
Our suggestion is to introduce an implementation-defined set of "device memory" addresses (which may depend on linking), and which is guaranteed to be disjoint from normal C-accessible stack, heap, and program memory, for which the creation of such pointers be allowed.
Pointer values and integer values both carry a provenance, either the "empty" provenance, a single provenance ID, or the "wildcard" provenance.
On every allocation (of objects with static, thread, automatic, and allocated storage duration), we choose a fresh provenance ID (unique in the entire execution), and the resulting pointer value carries that single provenance ID.
At any access via a pointer value, its numeric address must be consistent with its provenance, with undefined behaviour otherwise. In particular:
access via a pointer value with empty provenance is undefined behaviour (except where the numeric value is within an implementation-defined set of "device" memory addresses);
access via a pointer value with a single provenance ID must be within the corresponding allocation;
access via a pointer value with wildcard provenance must be within some currently live object.
NULL pointers constructed from integer constant expressions have the empty provenance.
Whether pointer equality comparison (with ==
or !=
) takes the associated provenances into account or not is indeterminate.
Pointer relational comparison (with <
, <=
, >
, and >=
) is unaffected by the associated provenances.
All casts among pointer and integer types preserve provenance.
The result of subtraction of two pointer values is an integer value with empty provenance, irrespective of the operand provenances (in particular, irrespective of whether they point within the same object or not - but if not, the resulting offset is not usable for moving between the two objects).
The result of an addition or subtraction of a pointer value and an integer value has the provenance of the pointer value
The result of operations on integer values is as follows:
the result of the address-of has the corresponding provenance of the object associated with the lvalue, for non-function-pointers, or empty for function pointers.
the provenance of the result of the unary *
operator is whatever was stored
integer unary +
, unary -
, and ~
operators preserve the original provenance; logical negation ! gives a value with empty provenance.
sizeof
and _Alignof
operators give values with empty provenance
multiplicative and additive operators, bitwise AND, bitwise exclusive OR, and bitwise inclusive OR operators have provenance as follows:
if both have empty provenance, the result has that
if exactly one argument has non-empty provenance, the result has that
if both have the same single provenance, the result has that
if they have different single provenances, the result has empty provenance
if one has a single provenance and the other the wildcard provenance, the result has the wildcard provenance
if both have wildcard provenance, the result has that
Summarising:
| empty ID ID' wildcard
---------+-------------------------------------
empty | empty ID ID' wildcard
ID | ID ID empty wildcard
ID' | ID' empty ID' wildcard
wildcard | wildcard wildcard wildcard wildcard
Note that this use of empty for the ID/ID' cases is liberal as far as alias analysis is concerned, but requires programmers to be conservative. One could instead make combinations of values with different single provenances to have the wildcard provenance, making them legal to use for accessing memory.
bitwise shift has the provenance of its first operand
relational, equality, logical AND, logical OR, and constant expressions give values with empty provenance
prefix increment and decrement operators follow the pointer or integer arithmetic rules above
the conditional operator gives the provenance of the second or third operand as appropriate; simple assignment gives the provenance of the expression; compound assignment follows the pointer or integer arithmetic rules above; the comma operator gives the provenance of the second operand,
The representation bytes of a pointer have the provenance of the pointer.
if all have the empty provenance, the result has the empty provenance
if one has a single provenance and the others all have either the same single provenance or empty provenance, the result has the single provenance
if two have distinct single provenance, the result has empty provenance
if any have a wildcard provenance, the result has the wildcard provenance
As above, the distinct-single-provenance case could be changed to give an wildcard provenance for combinations of distinct single provenances.
Provenance is not propagated via control flow (e.g. by conditionals that check equality of a pointer value).
To permit pointers to be constructed via IO (e.g. via %p
or by marshalling and unmarshalling their representation bytes, in the same runtime), there are several choices, and it would be useful to know what compiler alias analysis actually does here.
Pointers with the may_alias
attribute have wildcard provenance.
provenance
abstract information associated to each concrete value of pointer or integer type. It can either be the empty provenance, a single provenance for a given object or region, or the wildcard provenance.
When the lvalue conversion is performed on an lvalue with pointer type, if it has the empty provenance and the associated memory location is not within the implementation-defined device memory, the behavior is undefined. If it has a single provenance and the object designated by the lvalue is not the same as or within the object from the single provenance, the behavior is undefined.
NOTE: for the wildcard provenance there is no additional check.
The provenance of a value resulting from an lvalue conversion is has follows: if all the bytes of the designated object have the empty provenance, then so does the resulting value; if some bytes have a single provenance and all the other bytes have the empty provenance, then the resulting value has the same single provenance; if any two bytes have different single provenances, then the resulting value has the empty provenance; if any byte has the wildcard provenance, then so does the resulting value.
There is an implementation-defined set of integer values for which the resulting pointer is within a device region of storage.
A null pointer has the empty provenance
Conversions between pointer and integer type
append to the end of (§6.3.2.3#5):
The resulting pointer has the provenance of the converted integer.
append to the end of (§6.3.2.3#6):
The resulting integer has the provenance of the converted pointer.
add the following clause to (§6.3.2.3):
All conversions between two pointer types leave the provenance unchanged.
Expression operators add the following clause to (§6.5) (for later reference we call this clause X, to be replaced by its section number)
Some expression operators evaluate to an integer value when both operands have integer type. For these the provenance of the resulting integer is as follows: if the value of both operands have empty provenance, so does the result; if the value of one operand has a single provenance and the other has empty provenance, or both values have the same single provenance, the result has that single provenance; if the values of the two operands have different single provenances, the result has the empty provenance; if the value of one operand has the wildcard provenance, the result has the wildcard provenance.
When the operand designates an object, the result has the single provenance of the outermost object containing that object.
For all these operators, when the operand has integer type, the result has the empty provenance.
The result has the empty provenance.
When both operands have integer type, the resulting integer has provenance as described in X.
Additive operators
When two integers are added or subtracted, the resulting integer has provenance as described in X.
The result has the provenance of the pointer operand.
The result has the empty provenance.
Bitwise shift operators add the following clause to (§6.5.7):
The result has the provenance of the first operand.
, and has the empty provenance.
Equality operator
, and has the empty provenance.
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.109)
to read:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space, and they have the same single provenance (for the other cases involving their provenances, it is unspecified whether the pointers compare equal).109)
Bitwise AND operator append to the end of (§6.5.10#4):
The result has provenance as described in X.
The result has provenance as described in X.
The result has provenance as described in X.
, and has the empty provenance.
, and has the empty provenance.
The value may be copied into an object of type unsigned char [n] (e.g., by memcpy)
to read:
The value may be copied into an object of type unsigned char [n] (e.g., by memcpy), if the value is an integer or a pointer (and therefore has a provenance), the elements of the resulting array all have that provenance.
If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to that value
to read:
If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to the most recent such value and have the same provenance