We continue a journey of discovery into the interaction between software security, programming languages, and compilers. New and interesting observations are still turning up. WG14 has discussed this topic at the Delft and Milpitas meetings. Further discussion has taken place on the WG14 reflector, including contributions by Doug Gwyn and Joseph Myers. (See citations in-line and in References below.)
The separate disciplines of software security and safety-critical software fully overlap in one area: preventing bugs in critical components.
However, software security has an additional area of concern: preventing a malicious hacker from injecting and executing hostile code.
BUG: a software error that causes an incorrect result to be produced.
CAUSE-EFFECT ANALYSIS: the process of analyzing the dependence of downstream results on prior results.
A bug in one component can lead to incorrect results in all components that are downstream as per cause-effect analysis.
CRITICAL COMPONENT: a component in which a bug can cause a vulnerability or a hazard.
A security example might be password authentication. In safety-critical code, most or all of the software may be in critical components.
The point here is that any bug in a critical component might cause a vulnerability or a hazard.
TAINTED DATA: data which originates from, or is causally downstream from, untrusted sources. (Cf. Gwyn "[...] bugs of the kind that we seem to be concerned with (programs running amok due to unanticipated input going unchecked)".) [Gwyn-11528]
SUBSET-PLUS-STATIC: the use of carefully-chosen subsets of the implementation languages combined with enforcement using static analysis and code review.
In the previous section, we discussed cause-effect analysis. Of course, no matter how the execution image is corrupted, the hardware still exhibits causality at the hardware instruction level. Henceforth, when we discuss "causality" we refer to analyzability of the source-code behavior.
On certain platforms, every "veteran" knows that address
arithmetic silently wraps around.
So when the security expert guys write an error-check like
if(buf+len < buf) /* wrap check */
[...overflow occurred...]
The current (C90 and C99) rules about UB,
combined with modern compiling technology,
can create other situations in which source-code
causality can be lost.
if (cond) { A[1] = X; }
else { A[0] = X; }
if (cond) { A; B; } else { C; D; }
Loop optimizations - various optimizations are only valuable if the compiler can assume that the loop index variable does not wrap around. (I'm told that some implementations even do these by default with unsigned index variables, although that does not conform to the standard, so valuable are the optimizations found in practice.) |
On the other hand, just because UB might result the compiler is not obliged to destroy source-code causality:
Optimizing expressions such as (x*10)/10 to x (for signed x) such expressions typically arising from macros |
However, my main concern is with the non-critical components. There is a small set of undefined behaviors which if they happen, even in a non-critical component, can directly cause a vulnerability or hazard. The cause-effect analysis described above assumes that each component modifies only certain objects - colloquially, it modifies its outputs, and if it's buggy, it may produce buggy outputs. The outputs are the values and objects which are produced or modified by the statements in the C program. But there are other values and objects that aren't meant to be directly accessed by the program statements, they're meant to be manipulated only by the system itself - such things as bookkeeping data in the heap, or a function return pointer, or the stack-frame layout in general. The distinction I'm after is that the critical UB category includes those that might modify this system data. Once a critical UB takes place, source-code cause-effect analysis is almost useless; the set of possible "downstream" effects is unbounded. Colloquially, "all bets are off".
Note also that various other UBs that produce incorrect pointer values are only one step away from an out-of-bounds store; still we maintain the distinction that the out-of-bounds store is the critical UB. In other words, an incorrect pointer value, that never got used, never actually produced a critical UB.
Software security issues are especially relevant to two categories of applications written in C or C++:
Do we think these categories comprise a "boutique" niche, or a major portion of C (and C++) usage?
Replace
3.4.3 undefined behavior
behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard imposes
no requirements
as follows (re-numbering as needed):
3.4.3a to perform a trap
to interrupt execution of the program such that
no further operations are performed,
or to invoke a runtime-constraint handler
3.4.3b out-of-bounds store
an (attempted) access (3.4.1) which, at run-time,
for a given computational state, would modify one or more
bytes (or for an object declared volatile, would fetch one or more bytes)
that lie outside the bounds permitted by the standard
3.4.3c undefined behavior
behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard imposes
no requirements, except that the behavior shall not perform an
out-of-bounds store
and that all values produced or stored are unspecified values
NOTE the behavior might perform a trap
3.4.3d critical undefined behavior
behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard imposes
no requirements
NOTE the behavior might perform an out-of-bounds store, or might perform a trap
3.4.3e Unrestricted Profile
In the Unrestricted Profile, every instance of
an undefined behavior is permitted to be a
critical undefined behavior.
(Note to the reader: a previous email [Plum-11533] identified the restricted configuration as the "Security Profile" and the unrestricted configuration as the ordinary standards-conforming behavior.)
Identify the following undefined behavior situations as critical undefined behavior:
6.2.4 | An object is referred to outside of its lifetime |
6.3.2.1 | An lvalue does not designate an object when evaluated |
6.5.3.2 | The operand of the unary * operator has an invalid value |
6.5.6 | Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated |
7.20.3 | The value of a pointer that refers to space deallocated by a call to the free or realloc function is used |
7.21.1, 7.24.4 | A string or wide string utility function is instructed to access an array beyond the end of an object |
Email reflector contributions: