JTC1/SC22/WG14
N825
Document Number: WG14 N825/X3J11 98-024
WG14/N825 C9X Public Comment WG14/N825
==================
Sponsoring National Body: J11 Date: 98/05/15
Author: Tom MacDonald
Author Affiliation: Silicon Graphics Inc.
Postal Address: 655F Lone Oak Drive, Eagan, MN 55409 USA
E-mail Address: tam@cray.com
Telephone Number: +1 612 6835818
Fax Number: +1 612 6835307
Number of individual comments: 2
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% %%
%% Problems With Undefined Behavior %%
%% %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
As I understand it, the intents of "undefined behavior" in the current
Draft are:
- let a programmer know something is not portable
- often an outright error
- no diagnostic required
- if implementation elects to issue a diagnostic, it has to be
a warning and not a fatal error (i.e., program is translated
into something)
Seems like there are some conflicting statements in the C9X Draft:
3.18 Undefined behavior
[#1] Behavior, upon use of a nonportable or erroneous
program construct, of erroneous data, or of indeterminately
valued objects, for which this International Standard
imposes no requirements. Permissible undefined behavior
ranges from ignoring the situation completely with
unpredictable results, to behaving during translation or
program execution in a documented manner characteristic of
the environment (with or without the issuance of a
diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
The paragraph above indicates the implementation can terminate the
translation process if undefined behavior is detected.
Paragraph 3 in 3.18 contains contradictory statements:
[#3] The implementation must successfully translate a given
program unless a syntax error is detected, a constraint is
violated, or it can determine that every possible execution
of that program would result in undefined behavior.
Another problem with paragraph 3 above is that there are 8 phases
of translation. Translation Phase 7 says:
... The resulting tokens are syntactically and semantically
analyzed and translated as a translation unit.
Paragraph 3 above indicates the implementation must successfully translate
the entire program. Typically the translator only translates through
phase 7, and phase 8 creates the program image using the output of
the translator:
8. All external object and function references are
resolved. Library components are linked to satisfy
external references to functions and objects not
defined in the current translation. All such
translator output is collected into a program image
which contains information needed for execution in its
execution environment.
So, here's a scenario:
The following include file cannot be found by the translator
#include <x\y>
and "6.1.7 Header names" says this is undefined behavior.
At this point the implementation is allowed to behave in
an unpredictable way producing unpredictable results.
Seems like one of those unpredictable results is producing
the following output:
command not found
What does it mean to say "that every possible execution results in
undefined behavior" for such a case? It's not obvious.
What should we do? *Warning* radical suggestion ahead!!!
Let's delete paragraph 3 above. I'm not sure it accomplishes whatever,
we as a committee wanted it to accomplish. It also changes one of the
original motivations for undefined behavior.
Originally, one of the intents of undefined behavior was to allow an
implementation to extend C in a particular way, but not force other
vendors to extend in the same way. We always said that, that vendor can
just reject that program if it's undefined behavior. Now the vendor must
successfully translate the program (assuming we fix existing wording
problems). The problem now is that a vendor cannot issue a fatal error at
translation time if undefined behavior is found. Granted they can issue a
warning, but it's easy to miss a warning when a recompilation of a large
application occurs.
The current wording places a burden on the implementors. When customer X
complains that Vendor A successfully compiled a program containing an
obvious error, the vendor is forced to explain this decision. Customer
support costs are expensive and vendors try to minimize them. Paragraph 3
appears, from the vendor point of view, to be an attempt to significantly
increase the customer support costs.
Remember, you cannot fail to translate just because the following occur:
- An unmatched ' or " character is encountered on a logical
source line during tokenization (6.1).
- A reserved keyword token is used in translation phase 7 or 8
for some purpose other than as a keyword (6.1.1).
- The reserved token complex or imaginary is used before
<complex.h> is included (6.1.1).
- The first character of an identifier is a digit (6.1.2).
- The same identifier has both internal and external linkage
in the same translation unit (6.1.2.2).
- A block containing a variably modified object having
automatic storage duration is entered by a jump to a labeled
statement (6.1.2.4).
- The whole-number and fraction parts of a floating constant
are both omitted (6.1.3.1).
- For a function call without a function prototype, the
function is defined without a function prototype, and the
types of the arguments after promotion are not compatible
with those of the parameters after promotion (6.3.2.2).
- A pointer is converted to other than an integer or pointer
type (6.3.4).
- An expression is shifted by a negative number or by an
amount greater than or equal to the width of the promoted
expression (6.3.7).
- An expression that is required to be an integer constant
expression does not have an integer type, contains casts
(outside operands to sizeof operators) other than
conversions of arithmetic types to integer types, or has
operands that are not integer constants, enumeration
constants, character constants, fixed-length sizeof
expressions, or immediately-cast floating constants (6.4).
- A constant expression in an initializer does not evaluate to
one of the following: an arithmetic constant expression, a
null pointer constant, an address constant, or an address
constant for an object type plus or minus an integer
constant expression (6.4).
- An arithmetic constant expression does not have arithmetic
type, contains casts (outside operands to sizeof operators)
other than conversions of arithmetic types to arithmetic
types, or has operands that are not integer constants,
floating constants, enumeration constants, character
constants, or sizeof expressions (6.4).
- An address constant is created neither explicitly using the
unary & operator or an integer constant cast to pointer
type, nor implicitly by the use of an expression of array or
function type (6.4).
- An identifier for an object is declared with no linkage and
the type of the object is incomplete after its declarator,
or after its init-declarator if it has an initializer (6.5).
- A function is declared at block scope with an explicit
storage-class specifier other than extern (6.5.1).
- A structure or union is defined as containing no named
members (6.5.2.1).
- A bit-field is declared with a type other than a qualified
or unqualified version of signed int or unsigned int
(6.5.2.1).
- A tag is declared with the bracketed list twice within the
same scope (6.5.2.3).
- etcetera ...
Many customers are not going to understand why the vendors successfully
translated their application when some obvious error occurred. Vendors
will be forced to provide non-standard ways of getting fatal errors for
obvious mistakes.
Paragraph 3 seems to be doing a disservice to both vendors and users.