JTC1/SC22/WG14
N685
C9X Revision Proposal
=====================
WG14/N685 (J11/97-048) WG14/N685 (J11/97-048)
Title: Compatibility Issues with Union Members
Author: Tom MacDonald and Bill Homer
Author Affiliation: Cray Research, an SGI company
Postal Address: Cray Research Park
655F Lone Oak Drive
Eagan, MN 55121
USA
E-mail Address: tam@cray.com homer@cray.com
Document Number: WG14/N685 (J11/97-048)
Telephone Number: +1-612-683-5818
Fax Number: +1-612-683-5307
Sponsor: J11
Date: 08-May-1997
Proposal Category:
__ Editorial change/non-normative contribution
__ Correction
XX New feature
__ Addition to obsolescent feature list
__ Addition to Future Directions
__ Other (please specify) Change of current behavior
Area of Standard Affected:
__ Environment
XX Language
__ Preprocessor
__ Library
__ Macro/typedef/tag name
__ Function
__ Header
__ Other (please specify) ______________________________
Prior Art: Tag Compatibility
Target Audience: all C programmers
Related Documents (if any): NONE
Proposal Attached: XX Yes __ No, but what's your interest?
Abstract: A discussion is presented on how initial common
sequences in union members presents the same
problem solved by the "tag compatibility" changes.
Proposed edits to C9X to ameliorate this problem
are also provided.
======================= Cover sheet ends here ==============
Introduction:
A new issue related to the tag compatibility issue has surfaced.
As mentioned in recent email discussion of the representations of
pointers, there is an "aliasing loophole" for union objects having
structure members with a common initial sequence of members.
The issue is that the loophole as written has the (apparently
unintended) consequence of making alias analysis more difficult for
_any_ pair of pointers to different structures that happen to share a
common initial sequence. See below for details and a proposal to limit
the consequences of the loophole while still allowing programmers to
take advantage of it when convenient.
Historical perspective:
If you remember, the following example is no longer considered to be
strictly conforming:
x.c | y.c
___________________________________|________________________________
|
#include <stdio.h> | struct tag2 {
int func(); | int m1, m2;
| };
struct tag1 { | struct tag3 {
int m1, m2; | int m1, m2;
} st1; | };
|
main() { | int func(struct tag2 *pst2,
if (func(&st1, &st1)) { | struct tag3 *pst3) {
printf("optimized\n"); | pst2->m1 = 2;
} else { | pst3->m1 = 0; /* alias? */
printf("unoptimized\n"); | return pst2->m1;
} | }
} |
in that a highly optimizing compiler might produce "optimized" as
the output of the program, while the same compiler with optimization
turned off might produce "unoptimized" as the output. This is because
Translation Unit (TU) y.c defines "func" with 2 parameters each as
pointers to different structures, and TU x.c calls "func" but passes
the address of the same structure for each argument.
We made this change to help optimizers, debuggers, lint-like tools, etc.
New issue:
Consider the following:
w.c
___________________________________
#include <stdio.h>
int func();
union utag {
struct tag1 {
int m1;
double d2;
} st1;
struct tag2 {
int m1;
char c2;
} st2;
} un1;
main() {
if (similar_func(&un1.st1, &un1.st2)) {
printf("optimized\n");
} else {
printf("unoptimized\n");
}
}
Since unions are allowed to have structure members with a common initial
sequence, and you can modify one structure and then inspect the common
initial part through a different structure, most of the perceived benefit
of the new tag compatibility rules are lost.
Proposed solution:
The proposed solution is to require that a union declaration be visible
if aliases through a common initial sequence (like the above) are possible.
Therefore the following TU provides this kind of aliasing if desired:
union utag {
struct tag1 { int m1; double d2; } st1;
struct tag2 { int m1; char c2; } st2;
};
int similar_func(struct tag1 *pst2, struct tag2 *pst3) {
pst2->m1 = 2;
pst3->m1 = 0; /* might be an alias for pst2->m1 */
return pst2->m1;
}
Here are some proposed words for C9X (with change bars in the left
margin):
----------------------------------------------------------------------
C9X Draft 9-pre3, 1997-02-15, WG14/Nxxx J11/97-xxx
6.3.2.3 Structure and union members
Constraints
[#5] With one exception, if a member of a union object is
accessed after a value has been stored in a different member
of the object, the behavior is implementation-defined.54
One special guarantee is made in order to simplify the use
of unions: If a union contains several structures that
share a common initial sequence (see below), and if the
union object currently contains one of these structures, it
is permitted to inspect the common initial part of any of
them. Two structures share a common initial sequence if
^
| anywhere that a declaration of the completed type of the
| union is visible
corresponding members have compatible types (and, for bit-
fields, the same widths) for a sequence of one or more
initial members.
__________
54. The ``byte orders'' for scalar types are invisible to
isolated programs that do not indulge in type punning
(for example, by assigning to one member of a union and
inspecting the storage by accessing another member that
is an appropriately sized array of character type), but
must be accounted for when conforming to externally
imposed storage layouts.
Examples |
[#6]
1. If f is a function returning a structure or union, and
x is a member of that structure or union, f().x is a
valid postfix expression but is not an lvalue.
2. The following is a valid fragment: |
union {
struct {
int alltypes;
} n;
struct {
int type;
int intnode;
} ni;
struct {
int type;
double doublenode;
} nf;
} u;
u.nf.type = 1;
u.nf.doublenode = 3.14;
/*...*/
if (u.n.alltypes == 1)
/*...*/ sin(u.nf.doublenode) /*...*/
| 3. The following is not a valid fragment (because
| the union type is not visible within function f):
|
| struct t1 { int m; };
| struct t2 { int m; };
| int f(struct t1 * p1, struct t2 * p2) {
| if ( p1->m < 0 ) {
| p2->m = -p2->m;
| }
| return p1->m;
| }
| int g() {
| union { struct t1 s1; struct t2 s2; } u;
| /* ... */
| return f(&u.s1, &u.s2);
| }