JTC1/SC22/WG14 N698

                           N698 J11/97-061
                Implementation Defined Integral Types
                      Randy Meyers and Doug Gwyn
                             23 June 1997

1  Introduction

Doug Gwyn distributed via the reflector a proposal (N713) to allow
implementation defined integral types to be used in the standard
headers.  Doug and I discussed the proposed wording changes in N713
and produced this updated version.

Early versions of this paper were also distributed to Clive Feather,
Frank Farance, and Douglas Walls.  Clive provided particularly
valuable feedback about issues with the representation of unsigned
integers, and issues raised in his paper N691.

This paper contains no new issues that have not been in previous
proposals before the committee.  This version of the proposal
incorporates some ideas from N606 by Frank Farance and N669 by Clive
Feather.

2  Overview of Proposal

Implementation defined integral types are incorporated into the
Standard by allowing implementations to add additional types to the
set of "signed integer types."  By existing wording in the Standard,
the implementation must supply corresponding unsigned integer types.
By definition, the implementation defined signed and unsigned integer
types are integral types, basic types, scalar types, and arithmetic
types.  All of the statements made in the Standard about those type
classes automatically apply to the implementation defined integer
types.  The same wording in the Standard that defines the properties
of the standard integer types defines the properties of the
implementation defined integer types as well.

For convenience, the terms "extended signed integer types", "extended
unsigned integer types", and "extended integer types" are defined.

The term "precision" is defined to solve an existing problem with the
Standard confusing "size" with an integer type's ability to represent
values.  Two integer types of the same size might have different
padding, and thus not be able to represent the same values.

The integral promotions and usual arithmetic conversions have been
made less implementation defined than in Doug's original proposal.
The new usual arithmetic conversions have the following properties:

     1.  The results for the standard types do not change.

     2.  When a Standard type and implementation defined type meet, if
         the signed or unsigned version of the standard type can
         represent the values of the implementation defined type, then
         the result is the (signed or unsigned) standard type.

     3.  The new rules are a generalization of the old rules, and
         retain their spirit.

     4.  The new rules behave like the old rules even for unusual
         implementations that use the same representation for all the
         standard types, or have unsigned types that are just the
         signed types with the sign bit ignored, or have unsigned
         representations that are much "bigger" than their signed
         counterparts.

This paper actually contains two equivalent alternative wordings for
the integral promotions and usual arithmetic conversions for the
committee to choose between.  Sections 4.1, 5.1, and 6.1 of this paper
make up the first alternative wording.  The second alternative
consists of either Section 4.1 or 4.2, plus section 5.2 and 6.2.

The paper contains some optional sections on constants (Section 7),
uniqueness of types (Section 8), preprocessor arithmetic (Section 9),
and the grammar (Section 10).  These sections may be voted in or out
without hurting the integrity of the proposal.

Note:  Text surrounded by *asterisks* should be italicized, while text
surrounded by {braces} should be set in Courier font.

3  Allow Implementation Defined Integral Types

Replace Section 6.1.2.5 (Types), paragraph 3:
     There are five *signed integer types*, designated as {signed
     char}, {short int}, {int}, {long int}, and {long long int}.  (The
     signed integer and other types may be designated in several
     additional ways, as described in 6.5.2.)

with:
     There are five *standard signed integer types*, designated as
     {signed char}, {short int}, {int}, {long int}, and {long long
     int}.  (These and other types may be designated in several
     additional ways, as described in 6.5.2.) There may also be
     implementation-defined *extended signed integer types*.
     [reference first new footnote] The standard and extended signed
     integer types are collectively called just *signed integer
     types*.  [reference second new footnote]

Add first new footnote:
     Implementation defined keywords must have the form of an
     identifier reserved for any use as described in 7.1.3.

Add second new footnote:
     Therefore, any statement in this Standard about the signed
     integer types also applies to the extended signed integer types.

After the following in Section 6.1.2.5 (Types), paragraph 5:
     For each of the signed integer types, there is a corresponding
     (but different) *unsigned integer type* (designated with the
     keyword {unsigned}) that uses the same amount of storage
     (including sign information) and has the same alignment
     requirements.

add:
     The unsigned integer types that correspond to the standard signed
     integer types are the *standard unsigned integer types*.  The
     unsigned integer types that correspond to the extended signed
     integer types are the *extended unsigned integer types*.  The
     extended unsigned integer types and extended unsigned integer
     types are collectively called the *extended integer types*.


4  Define Precision For Integer Types

Existing wording in the Standard refers to the "size" of integer types
in a problematical fashion.  From Section 6.2.1.2 (Signed and unsigned
integers), paragraph 2, defining integer conversions:

     When a signed integer is converted to an unsigned integer with
     equal or greater size, if the value of the signed integer is
     nonnegative, its value is unchanged.

If integers are allowed to have padding (bits in their representation
that do not participate in the value stored in the integer), then the
above section fails to consider the case of two integers that are the
same size, but use a different number of bits to store the value.

Frank Farance suggested in N606 that a new term, precision, be defined
for integer types.  This proposal contains two alternative definitions
from which the committee can choose.

4.1  Precision Definition 1

This definition of "precision" special cases the definition for the
unsigned types in in order to make the first of the alternative
wordings below for the integral promotions and usual arithmetic
conversions work (this definition also works for the second
alternative for the promotions and conversions).

After the following in Section 6.1.2.5 (Types), paragraph 16:
     The representations of integral types shall define values by use
     of a pure binary numeration system.25

Add:
     The *precision* of a signed integer type is the number of bits it
     uses to represent values excluding the sign bit and any padding.
     The precision of an unsigned integer type is considered to be the
     same as the corresponding signed integer type, although the
     number of bits used to represent values may be greater.  The
     precision of an enumerated type is the precision of the
     compatible integral type.  Regardless of its representation, the
     precision of {char} is considered to be the precision of {signed
     char} and {unsigned char}.

4.2  Precision Definition 2

This definition of precision contains no special cases.  It only works
with the second alternative wording for the integral promotions and
usual arithmetic conversions.

After the following in Section 6.1.2.5 (Types), paragraph 16:
     The representations of integral types shall define values by use
     of a pure binary numeration system.25

Add:
     The *precision* of an integral type is the number of bits it uses
     to represent values excluding the sign bit (if any) and any
     padding.

5  Integral Promotions

This section gives two alternative wordings for the integral
promotions.  The first alternative is based exclusively on precision.
The second alternative is based on a new concept called the integral
conversion rank of types.  This ranking, once defined, allows the
promotions and conversions to be expressed more succinctly.

5.1  Integral Promotions Alternative 1

Change Section 6.2.1.1 (Characters and integers), paragraph 1:
     A {char}, a {short int}, or an {int} bit-field, or their signed
     or unsigned varieties, or an enumeration type, may be used in an
     expression wherever an {int} or {unsigned int} may be used.  If
     an {int} can represent all values of the original type, the value
     is converted to an {int}; otherwise, it is converted to an
     {unsigned int}.  These are called the *integral promotions*.37
     All other arithmetic types are unchanged by the integral
     promotions.

to:
     The following may be used in an expression wherever an {int} or
     {unsigned int} may be used:

          -- An integral type whose precision is less than or equal to
             the precision of {int} and {unsigned int}

          -- A bit-field of type {int}, {signed int}, or {unsigned
             int}

     If an {int} can represent all values of the original type, the
     value is converted to an {int}; otherwise, it is converted to an
     {unsigned int}.  These are called the *integral promotions*.37
     All other types are unchanged by the integral promotions.

Note that Section 6.1.2.5 paragraph 16 defines integral types as char,
the signed and unsigned integer types, and the enumerated types.

5.2  Integral Promotions Alternative 2

Replace Section 6.2.1.1 (Characters and integers), paragraph 1:
     A {char}, a {short int}, or an {int} bit-field, or their signed
     or unsigned varieties, or an enumeration type, may be used in an
     expression wherever an {int} or {unsigned int} may be used.  If
     an {int} can represent all values of the original type, the value
     is converted to an {int}; otherwise, it is converted to an
     {unsigned int}.  These are called the *integral promotions*.37
     All other arithmetic types are unchanged by the integral
     promotions.

with the following paragraphs:
     Every integral type has a *integral conversion rank* defined as
     follows:

       -- No two signed integer types shall have the same rank, even
          if they have the same representation.

       -- The rank of a signed integer type shall be greater than the
          rank of any signed integer type with less precision.

       -- The rank of any standard signed integer type shall be
          greater than the rank of any extended signed integer type
          with the same precision.

       -- The rank of {long long int} shall be greater than the rank
          of {long int}, which shall be greater than the rank of
          {int}, which shall be greater than the rank of {short int},
          which shall be greater than the rank of {signed char}.

       -- The rank of any unsigned integer type shall equal the rank
          of the corresponding signed integer type.

       -- The rank of {char} shall equal the rank of {signed char} and
          {unsigned char}.

       -- The rank of any enumerated type shall equal the rank of the
          compatible integer type.

       -- The rank of any extended signed integer type relative to
          another extended signed integer type with the same precision
          is implementation-defined, but still subject to the other
          rules for determining the integral conversion rank.

       -- For all integral types *T1*, *T2*, and *T3*, if *T1* has
          greater rank than *T2* and *T2* has greater rank than *T3*
          then *T1* has greater rank than *T3*.

     The following may be used in an expression wherever an {int} or
     {unsigned int} may be used:

          -- An object or expression with an integral type whose
             integral conversion rank is less than the rank of {int}
             and {unsigned int}.

          -- A bit-field of type {int}, {signed int}, or {unsigned
             int}.

     If an {int} can represent all values of the original type, the
     value is converted to an {int}; otherwise, it is converted to an
     {unsigned int}.  These are called the *integral promotions*.37
     All other types are unchanged by the integral promotions.

Note that Section 6.1.2.5 paragraph 16 defines integral types as char,
the signed and unsigned integral types, and the enumerated types.

6  Usual Arithmetic Conversions

This section gives two alternative wordings for the usual arithmetic
conversions.  The first is based on precision.  The second is based on
integral conversion rank.

6.1  Usual Arithmetic Conversions Alternative 1

Starting with the following text in Section 6.2.1.7 (Usual arithmetic
conversions), paragraph 1:
     Otherwise, the integral promotions are performed on both
     operands.  Then the following rules are applied:

delete to the end of the paragraph 1 and replace with:
     Otherwise, the integral promotions are performed on both
     operands.  Then the following rules are applied to the promoted
     operands:

          If the operands have different precisions, the operand with
          less precision is converted to the type of other the
          operand.

          Otherwise, the operands have the same precision:

               If either operand has type {long long int} or {unsigned
               long long int}, then both operands are converted to
               {unsigned long long int} if either operand has an
               unsigned integer type.  Otherwise, both operands are
               converted to {long long int}.

               Otherwise, if one operand has type {long int} or
               {unsigned long int}, then both operands are converted
               to {unsigned long int} if either operand has an
               unsigned integer type.  Otherwise, both operands are
               converted to {long int}.

               Otherwise, if one operand has type {int} or {unsigned
               int}, then both operands are converted to {unsigned
               int} if either operand has an unsigned integer type.
               Otherwise, both operands are converted to {int}.

               Otherwise, if both operands have the same type, then no
               further conversion is needed.

               Otherwise, if one operand has signed integer type and
               the other operand has the corresponding unsigned
               integer type, then the operand with the signed integer
               type is converted to the type of the operand that has
               unsigned integer type.

               Otherwise, both operands are extended integer types
               with the same precision.  There shall be an
               implementation defined ranking of all extended signed
               integer types that have the same precision.  No two
               extended signed integer types shall have the same rank,
               even if they have the same representation.  The
               unsigned integer type that corresponds to an extended
               signed integer type shall have the same rank as that
               signed integer type.

                    Then, if either operand has an unsigned integer
                    type, both operands are converted to the unsigned
                    integer type that is or corresponds to the operand
                    type with greater rank.

                    Otherwise, the operand with the type of lesser
                    rank is converted to the type of the operand whose
                    type has greater rank.

Take care in reading the above.  Remember, the case where the types
have different precisions is handled before all of the conditional
clauses.  This removed the need in the Standard's present wording for
discussing what happens when long long, long, and/or int have the same
versus different "sizes".

6.2  Usual Arithmetic Conversions Alternative 2

Starting with the following text in Section 6.2.1.7 (Usual arithmetic
conversions), paragraph 1:
     Otherwise, the integral promotions are performed on both
     operands.  Then the following rules are applied:

delete to the end of the paragraph 1 and replace with:
     Otherwise, the integral promotions are performed on both
     operands.  Then the following rules are applied to the promoted
     operands:

          If both operands have the same type, then no further
          conversion is needed.

          Otherwise, if both operands have signed integer types or
          both have unsigned integer types, the operand with the type
          of lesser integral conversion rank is converted to the type
          of the operand with greater rank.

          Otherwise, if the operand that has unsigned integer type has
          rank greater or equal to the rank of the type of the other
          operand, then operand with signed integer type is converted
          to the type of the operand with unsigned integer type.

          Otherwise, if the type of the operand with signed integer
          type can represent all of the values of the type of the
          operand with unsigned integer type, then the operand with
          unsigned integer type is converted to the type of operand
          with signed integer type.

          Otherwise, both operands are converted to the unsigned
          integer type corresponding to the type of the operand with
          signed integer type.

7  Allow "Big" Constants To Have Extended Integral Type

The wording proposed in this section is optional.  The rest of the
proposal is consistent if this section is not voted in.

The existing wording in the Standard permits an implementation to give
a constant that is too big for {long long} or {unsigned long long} an
extended integer type.  No diagnostic is required.

Section 6.1.3.2 (Integer constants), paragraph 5, in Semantics says:
     The type of an integer constant is the first of the corresponding
     list in which its value can be represented.  Unsuffixed decimal:
     {int}, {long int}, {long long int}, {int}; unsuffixed octal or
     hexadecimal:  {int}, {unsigned int}, {long int}, {unsigned long
     int}, {long long int}, {unsigned long long int}; suffixed by the
     letter {u} or {U}:  {unsigned int}, {unsigned long int},
     {unsigned long long int}; suffixed by the letter {l} or {L}:
     {long int}, {unsigned long int}, {long long int}, {unsigned long
     long int}; suffixed by both the letters {u} or {U} and {l} or
     {L}:  {unsigned long int}, {unsigned long long int}; suffixed by
     {ll} or {LL}:  {long long int}, {unsigned long long int};
     suffixed by both {u} or {U} and {ll} or {LL}:  {unsigned long
     long int}.

Section 6.1.3 (Constants), paragraph 2, is the only constraint:
     The value of a constant shall be in the range of representable
     values for its type.

If the constant is too big for the types in its list, then the program
violates a semantics rule and is not strictly conforming.  An
implementation is allowed to extend the language to give meaning to
any program that is not strictly conforming.  In this case, the
extension is to give the constant an extended integer type.  As long
as the extended integer type can represent the value of the constant,
the constraint is not violated, and no diagnostic is required.

The Standard would benefit if it provided more direction to
implementations in which extended integer types are appropriate for
the different forms of constants.

At the end of Section 6.1.3.2 (Integer constants), paragraph 5 add:
     If an integer constant can not be represented by a type in its
     list, it may have an extended integer type, if the extended
     integer type can represent its value.  If all of the types in the
     list for the constant are signed, the extended integer type shall
     be signed.  If all of the types in the list for the constant are
     unsigned, the extended integer type shall be unsigned.  If the
     list contains both signed and unsigned types, the extended
     integer type may be signed or unsigned.

Note Draft 10 erroneously has an extra {int} at the end of the list
for Unsuffixed decimal.

Also, most people that have reviewed the list object to the fact that
decimal constants suffixed by L or LL are allowed to be unsigned.
Perhaps the committee voted in undesirable wording.

8  Uniqueness of types

The wording proposed in this section is optional.  The rest of the
proposal is consistent if this section is not voted in.

Section 6.1.2.5 (Types), paragraph 10 says:

     The type {char}, the signed and unsigned integer types, and the
     floating types are collectively called the *basic types*.  Even
     if the implementation defines two or more basic types to have the
     same representation, they are nevertheless different types.

Microsoft has keywords that are synonyms for standard types.  For
example, __int16 is a synonym for short and unsigned __int16 is a
synonym for short int.  Such a synonyms are not different types:
merely funny names for existing types, similar in some ways to a
typedef.  Such synonyms are not distinct types, and so the above
paragraph does not apply to them.  A footnote would clarify this.

Add a new footnote to the end of Section 6.1.2.5 (Types), paragraph
10:
     An implementation may define new keywords that provide alterative
     ways to designate a basic (or any other) type.  An alternate way
     to designate a basic type does not violate the requirement that
     all basic types be different.  Implementation defined keywords
     must have the form of an identifier reserved for any use as
     described in 7.1.3.

9  Preprocessor arithmetic

The wording proposed in this section is optional.  The rest of the
proposal is consistent if this section is not voted in.

It seems wise to require preprocessing arithmetic to be performed in
the largest integral type that the implementation supports.

Replace the following sentences from Section 6.8.1 (Conditional
inclusion), paragraph 4:
     The resulting tokens comprise the controlling constant expression
     which is evaluated according to the rules of 6.4 using arithmetic
     that has at least the ranges specified in 5.2.4.2, except that
     {int} and {long}, and {unsigned int} and {unsigned long}, act as
     if they have the same representation as, respectively, {long
     long} and {unsigned long long}.

with:
     The resulting tokens comprise the controlling constant expression
     which is evaluated according to the rules of 6.4 using arithmetic
     that has at least the ranges specified in 5.2.4.2, except that
     the signed integer types and the unsigned integer types, act as
     if they have the same representation as, respectively, {intmax_t}
     and {uintmax_t} defined in the <inttypes.h> header.

Add Forward reference:
        Largest integral types (7.4.3)

Note, the above forward reference may have to be adjusted to reflect
the rewrite of the section on <inttypes.h>.

10  Syntax for Declarations

The wording proposed in this section is optional.  The rest of the
proposal is consistent if this section is not voted in.

The Standard requires that a violation of a syntax rule cause an
implementation to issue a diagnostic.  This section section proposes
extending the grammar to permit implementation defined keywords to be
type specifiers.  This change is only needed if the committee wishes
to remove the requirement that an implementation issue a diagnostic
when user code (as opposed to headers) uses implementation defined
keywords as type specifiers.

Note that the standard headers are not files (Section 7.1.2 footnote
112), and the committee has always held that the headers may be
implemented as a binary representation of the specified contents of
the header.  Issues of syntax do not really apply to headers, and so,
implementations are free to use extended syntax in the standard
headers without issuing a diagnostic (an implementation may use a
pragma to suppress such diagnostics while in the header).  Thus,
implementations may use extended integer types in the implementations'
headers without this proposed change to the grammar.

Gwyn, Meyers, and Feather do not feel the wording change in this
section is necessary.

After the following line in Section 6.5.2 (Type specifiers), paragraph
1:
     {long}

add new line:
     *extended-signed-integer-type*

Add new Syntax rule:
     *extended-signed-integer-type*:
          *identifier*

Corresponding changes should be made in Section B.2.2, page 352.

After the following in Section 6.5.2 (Type specifiers), paragraph 2:
     -- {unsigned long long}, or {unsigned long long int}

add two new list items:
     -- an identifier reserved for any use in 7.1.3 that designates an
        implementation-defined extended signed integer type, or the
        same identifier preceded by {signed}
     -- the same identifier preceded by {unsigned}

Add the following Forward reference:
     reserved identifiers (7.1.3)

11  Index Entries

New entries in the index should be made for the following terms:

     1.  standard signed integer types

     2.  extended signed integer types

     3.  standard unsigned integer types

     4.  extended unsigned integer types

     5.  extended integer types

     6.  precision

     7.  integral conversion rank (if the corresponding change is
         made)