Among the expressions listed in §6.5, paragraph 7, as allowed to access the value of an object is an lvalue of a character type, so that code like the following is well-defined:
extern int n; unsigned char *p = (unsigned char*)&n; int i = *p;
In §7.24.1 String function conventions the standard further states that:
In particular, the text doesn't mention any constraints on the other objects treated as arrays of character type.
Furthermore, string functions specified in §7.24.1 such as strlen and strcpy are described to operate on strings. For instance, strlen is described in §7.24.6.3 as follows:
#include <string.h> size_t strlen(const char *s);Description
And the strcpy function is described like so:
#include <string.h> char *strcpy(char * restrict s1, const char * restrict s2);Description
With the exception of the memcpy, memmove, memcmp, memchr, and memset functions that take void* arguments, all other string functions that take char* arguments are also described as operating on strings.
Finally, the term string is defined in §7.1.1 Definitions of terms as follows:
The specification and definitions above give rise to a number of questions illustrated by the following examples.
Is the following variation on the example above also intended to be well-defined?
int n = 1; unsigned char *p = (unsigned char*)&n; int i = strlen (p);In particular, since it contains at least one null character (byte), is the array of characters (bytes) that constitute the representation of the value of n in the example a string?
Note that if the answer is yes then in the following function the result of the first call to strlen cannot be substituted for the second call because the pointer s could store the address of n.
int n = 1; size_t f (const char *s) { n = strlen (s); // code that doesn't modify *s return strlen (s); }
Furthermore, if the strlen examples above are well-defined, is the following call to strlen function also intended to be?
struct Data { int i; void (pf)(void); }; void f (struct Data *p) { strcpy (p, "some text"); }Note that the strcpy call partially overwrites the function pointer member of struct Data. Such instances of data corruption have been linked to security exploits.
In contrast to the functions defined in the <string.h> header, the %s directive to the snprintf function specified in §7.21 Input/output <stdio.h> takes an argument constrained as follows.
This is a much stronger requirement than on the string functions. It makes it clear that a valid %s argument cannot point to an object of some other (non-character) type. As a result, unlike in the analogous strlen example above, in the code below the second snprintf call can be replaced by the result of the first.
extern int n; int g (const char *s) { n = snprintf (0, 0, "%s", s); // ... return snprintf (0, 0, "%s", s); }
The implication of the above is that against intent and intuition, calling snprintf can (at least in some cases) be a faster way to compute the length of string than calling strlen.
Furthermore, while conforming compilers can detect, diagnose, and even prevent some past-the-end accesses to subobjects by the printf faimily of functions caused by arguments to the %s directive that aren't properly nul-terminated strings, the same strategy could not be employed by the <string.h> functions if such past-the-end sobobject accesses were meant to be valid.
It is important for string manipulation to be efficient. Allowing string functions like strlen or strcpy to operate on the representation of objects of any type makes them less than optimal in the common case (when their arguments are arrays of type char). It is also a highly unlikely use case to call a string function like strlen to operate on an array of type other than character.
By the same token, string functions such as strcpy have been linked to successfully exploited vulnerabilities due to their susceptibility to buffer overflow. It is, therefore, also important to judiciously constrain their accesses to reduce such incidents. To make that possible, we propose to explicitly require arguments to string handling functions to be arrays of char analogously to %s arguments, and not objects of other types.
To that effect, we propose to make the following changes. In §7.1.1 Definitions of terms make changes as indicated below:
In §7.24.1 String function conventions make changes as indicated below:
In addition, as an independent improvement to make the distinction crisp between strings and arrays of character type manipulated by string functions like strcpy and objects of any type manipulated by the "raw memory" functions like memcpy we suggest to revisit DR 446 and consider making the changes proposed there. The changes are duplicated below for reference (the msemset changes are missing from DR 446).
Change §7.24.2.1, paragraph 2 as indicated below:
Change §7.24.2.2, paragraph 2 as indicated below:
Change §7.24.4.1, paragraphs 2 and 3 as indicated below:
–2–The memchr function locates the first
occurrence of c (converted to an unsigned char)
in the initial n bytescharacters (each
interpreted as unsigned char) of the object pointed to by
s. The implementation shall behave as if it reads
the bytescharacters sequentially and stops
as soon as a matching bytecharacter is found.
–3–The memchr function returns a pointer
to the located bytecharacter, or a null
pointer if the bytecharacter does not
occur in the first n bytes of the object.
And finally, change §7.24.6.1, paragraph 2 as indicated below: