1. Introduction
[C11] Integer types allows three representations for signed integral types:
-
Signed magnitude
-
Ones' complement
-
Two’s complement
See §3 C Signed Integer Wording for full wording.
C++17 goes further than C and only requires that "the representations of integral types shall define values by use of a pure binary numeration system". To the author’s knowledge no modern machine uses both C++ and a signed integer representation other than two’s complement (see §4 Survey of Signed Integer Representations). None of [MSVC], [GCC], and [LLVM] support other representations. This means that the C++ that is taught is effectively two’s complement, and the C++ that is written is two’s complement. It is extremely unlikely that there exist any significant codebase developed for two’s complement machines that would actually work when run on a non-two’s complement machine.
C and C++ as specified, however, are not two’s complement. Signed integers currently allow the existence of an extraordinary value which traps, extra padding bits, integral negative zero, and introduce undefined behavior and implementation-defined behavior for the sake of this extremely abstract machine.
[P0907r1] stands to change C++20 and make two’s complement the only signed integer representation that is supported. WG21 wants to hear from WG14 before making this change, and hopes that WG14 will be interested in making the same changes. Aaron Ballman has volunteered to present this paper at the WG14 Brno meeting, as well as with the C Safety and Security study group.
Based on guidance received from this paper, The author is happy to write a follow-up proposal with wording for WG14. The author will communicate WG14’s feedback to WG21 at the Rappersvil meeting in June 2018.
2. Details
The following is proposed to C++:
-
Status-quo Signed integer arithmetic remains non-commutative in general (though some implementations may guarantee that it is).
-
Change
bool
is represented as0
forfalse
and1
fortrue
. All other representations are undefined. -
Change
bool
only has value bits, no padding bits. -
Change Signed integers are two’s complement.
-
Change If there are M value bits in the signed type and N in the unsigned type, then M = N-1 (whereas C says M ≤ N).
-
Status-quo If a signed operation would naturally produce a value that is not within the range of the result type, the behavior is undefined.
-
Change None of the integral types have extraordinary values.
-
Change C11 note 53 has wording aroung trap representations within padding bits, e.g. for parity bits. C++ has no such wording.
-
Change Conversion from signed to unsigned is always well-defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2N.
-
Change Conversion from enumeration type to integral is the same as that of converting from the enumeration’s underlying type and then to the destination type, even if the original value cannot be represented by the specified type.
-
Status-quo Conversion from integral type to enumeration is unchanged: if the original value is not within the range of the enumeration values the behavior is undefined.
-
Change Left-shift on signed integer types produces the same results as left-shift on the corresponding unsigned integer type.
-
Change Right-shift is an arithmetic right shift which performs sign-extension.
-
Status-quo shift by larger-than or equal-to bit-width remains undefined.
-
Status-quo
constexpr
evaluation of signed integer arithmetic with undefined result is not a core constant expression. -
Change the range of enumerations without a fixed underlying type is simplified because of two’s complement.
-
Status-quo
is_modulo
type trait remainsfalse
for signed integer types unless an implementation chooses to defined overflow as wrap. -
has_unique_object_representations<T>
istrue
forbool
and signed integer types, in addition to unsigned integer types and others before. -
Status-quo atomic operations on signed integer types continues not to have undefined behavior, and is still specified to wrap as two’s complement (the definition is clarified to act as-if casting to unsigned and back).
-
Change address [LWG3047] to remove undefined behavior in pre-increment atomic operations.
3. C Signed Integer Wording
The following is the wording on integers from the C11 Standard.
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N−1, so that objects of that type shall be capable of representing values from 0 to 2N − 1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits;
signed char
shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M ≤ N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value −(2M) (two’s complement);
the sign bit has the value −(2M − 1) (ones’ complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.
If the implementation supports negative zeros, they shall be generated only by:
the
&
,|
,^
,~
,<<
, and>>
operators with operands that produce such a value;the
+
,-
,*
,/
, and%
operators where one operand is a negative zero and the result is zero;compound assignment operators based on the above cases.
It is unspecified whether these cases actually generate a negative zero or a normal zero, and whether a negative zero becomes a normal zero when stored in an object.
If the implementation does not support negative zeros, the behavior of the
&
,|
,^
,~
,<<
, and>>
operators with operands that would produce such a value is undefined.The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.
The precision of an integer type is the number of bits it uses to represent values, excluding any sign and padding bits. The width of an integer type is the same but including any sign bit; thus for unsigned integer types the two values are the same, while for signed integer types the width is one greater than the precision.
4. Survey of Signed Integer Representations
Here is a non-comprehensive history of signed integer representations:
-
Two’s complement
-
John von Neumann suggested use of two’s complement binary representation in his 1945 First Draft of a Report on the EDVAC proposal for an electronic stored-program digital computer.
-
The 1949 EDSAC, which was inspired by the First Draft, used two’s complement representation of binary numbers.
-
Early commercial two’s complement computers include the Digital Equipment Corporation PDP-5 and the 1963 PDP-6.
-
The System/360, introduced in 1964 by IBM, then the dominant player in the computer industry, made two’s complement the most widely used binary representation in the computer industry.
-
The first minicomputer, the PDP-8 introduced in 1965, uses two’s complement arithmetic as do the 1969 Data General Nova, the 1970 PDP-11.
-
-
Ones' complement
-
Many early computers, including the CDC 6600, the LINC, the PDP-1, and the UNIVAC 1107.
-
Successors of the CDC 6600 continued to use ones' complement until the late 1980s.
-
Descendants of the UNIVAC 1107, the UNIVAC 1100/2200 series, continue to do so, although ClearPath machines are a common platform that implement either the 1100/2200 architecture (the ClearPath IX series) or the Burroughs large systems architecture (the ClearPath NX series). Everything is common except the actual CPUs, which are implemented as ASICs. In addition to the IX (1100/2200) CPUs and the NX (Burroughs large systems) CPU, the architecture had Xeon (and briefly Itanium) CPUs. Unisys' goal was to provide an orderly transition for their 1100/2200 customers to a more modern architecture.
-
-
Signed magnitude
-
The IBM 700/7000 series scientific machines use sign/magnitude notation, except for the index registers which are two’s complement.
-
Wikipedia offers more details and has comprehensive sources for the above.
Thomas Rodgers surveyed popular DSPs and found the following:
-
SHARC family ships a C++ compiler which supports C++11, and where signed integers are two’s complement.
-
Freescale/NXP ships a C++ compiler which supports C++03, and where signed integers are two’s complement (reference).
-
Texas Instruments ships a C++ compiler which supports C++14, and where signed integers are two’s complement.
-
CML Microcircuits has fixed ASIC for radio processing, and doesn’t seem to support C++.
-
Cirrus Logic (formerly Wolfson) has a C compiler only which is probably two’s complement.
-
Synaptics (formerly Connexant) makes audio input subsystem for voice assistants. The DSP runs fixed far-field signal processing algorithms and has programmable functions which run on standard ARM controller, using Raspbian.
In short, the only machine the author could find using non-two’s complement are made by Unisys, and no counter-example was brought by any member of the C++ standards committee. Nowadays Unisys emulates their old architecture using x86 CPUs with attached FPGAs for customers who have legacy applications which they’ve been unable to migrate. These applications are unlikely to be well served by modern C++, signed integers are the least of their problem. Post-modern C++ should focus on serving its existing users well, and incoming users should be blissfully unaware of integer esoterica.