 
 JTC1/SC22/WG14 
N791
JTC1/SC22/WG14 
N791
SC22/WG14 N791
Solving the struct hack problem
Clive D.W. Feather
clive@demon.net
1997-10-22
Abstract
========
Several DRs have attempted to address the issue of the "struct hack". This
paper proposes an approach to making the technique available while avoiding
most of the problems of current practice.
Discussion
==========
The "struct hack" is a technique for using a dynamically sized structure:
a structure type is declared like this:
    struct hack
    {
        size_t n_elements;
        int data [1];
    };
space is then malloced:
    size_t n;
    /* ... */
    struct hack *p;
    p->n_elements = n;
    p = malloc (sizeof (struct hack) + sizeof (int) * (n - 1));
and the entire space is used:
    for (i = 0; i < p->n_elements; i++)
        p->data [i] = 0;
The problem is that accesses to p->data [i] for i > 0 are undefined behavior,
because a pointer (p->data + i) to beyond the end of the array is being
used. To quote the DR response (slightly modified):
    Subclause 6.3.2.1 describes limitations on pointer arithmetic, in
    connection with array subscripting (see also subclause 6.3.6).
    Basically, it permits an implementation to tailor how it represents
    pointers to the size of the objects they point at. Thus, the
    expression p->data[5] may fail to designate the expected [object],
    even though the malloc call ensures that the [object] is present.
    The idiom, while common, is not strictly conforming.
This paper implements a technique, apparently already supported by at least
one declaration, of allowing the structure to be declared as:
    struct hack
    {
        size_t n_elements;
        int data [];
    };
and then explicitly permitting the access to any element of the array that
is within the bounds of the malloced space.
Proposal
========
[References are to draft 11 pre 3.]
In subclause 6.5.2.1 (Structure and union specifiers), paragraph 2, change:
    A structure or union shall not contain a member with incomplete or
    function type.
to:
    A structure or union shall not contain a member with incomplete or
    function type, except that the last element of a structure may have
    incomplete array type.
add a new paragraph at the end of the semantics:
    As a special case, the last element of a structure may be an incomplete
    array type. This is called a /flexible array member/, and the size of
    the structure shall be equal to the offset of the last element of an
    otherwise identical structure that replaces the flexible array member
    with an array of one element. When an lvalue whose type is a structure
    with a flexible array member is used to access an object, it behaves as
    if that member were replaced by the longest array that would not make
    the structure larger than the object being accessed. If this array
    would have no elements, then it behaves as if there was one element,
    but the behavior is undefined if any attempt is made to access that
    element.
and add an example:
    Example:
    After the declarations:
        struct s { int n; double d []; };
        struct ss { int n; double d [1]; };
    the three expressions:
        sizeof (struct s)
        offsetof (struct s, d)
        offsetof (struct ss, d)
    have the same value. The structure /struct s/ has a flexible array
    member /d/.
    If /sizeof (double)/ is 8, then after the following code is executed:
        struct s *s1;
        struct s *s2;
        s1 = malloc (sizeof (struct s) + 64);
        s2 = malloc (sizeof (struct s) + 46);
    and assuming that the calls to /malloc/ succeed, /s1/ and /s2/ behave
    as if they had been declared as:
        struct { int n; double d [8]; } *s1;
        struct { int n; double d [5]; } *s2;
    Following the further successful assignments:
        s1 = malloc (sizeof (struct s) + 10);
        s2 = malloc (sizeof (struct s) +  6);
    they then behave as if they had been declared as:
        struct { int n; double d [1]; } *s1, *s2;
    and:
        double *dp;
        dp = &(s1->d[0]);    // Permitted
        *dp = 42;            // Permitted
        dp = &(s2->d[0]);    // Permitted
        *dp = 42;            // Undefined behavior