JTC1/SC22/WG14
N952
Document: WG14 N952
BDTI's Comments on N948
(Extension for the programming language C to support embedded processors)
John R. Hauser
Berkeley Design Technology, Inc.
2001 August 16
Since Copenhagen, we've adjusted our prototype fixed-point implementation at
BDTI to reflect the changes that came out of that meeting. In working with
the new system, we found a major flaw in the arithmetic rules, for which
we will be proposing a fix. Also, in response to concerns voiced by David
Keaton, we've tried to find a set of "least common denominator" requirements
for the fixed-point types that we can accept. At the same time, I've been
collecting details for most of the "multimedia extensions" to mainstream
processors (such as MMX for Intel Pentiums), so we can be sure they aren't
shortchanged by the language.
Most of the changes we suggest to the draft are detailed in text attached to
the bottom of this document. The most significant points are:
- Relax the requirements on fixed-point type formats.
The exact rules we are proposing are spelled out below, and I won't try
to repeat them here. Of note, our new rules would no longer require
that corresponding "fract" and "accum" types have the same number of
fractional bits. Although perhaps less concise than before, several
rules are needed to ensure that the arithmetic makes sense (e.g., "long
fract" cannot have fewer fractional bits than "fract"; etc.).
Included in the text below are our recommendations for specific fixed-
point formats on various systems. These recommendations are based on
a careful evaluation of the capabilities of each processor, and I'm
prepared to argue that the formats are realistic for each system. You
may note that, except in one instance, the formats we propose satisfy the
stricter type rules of the existing draft. We expect the current draft's
rules to be kept as "recommended practice", because they should be.
- Suppress the "usual arithmetic conversions" for fixed-point types.
BDTI's original proposal did not permit an arithmetic operation with, for
example, "accum" for one operand and "long fract" for the other. This
was refused on the grounds that "accum" has more integral bits than "long
fract", but "long fract" may have more fractional bits than "accum".
When the committee in Copenhagen insisted that this be allowed, some
"usual arithmetic conversion" rules were invented that promoted the "long
fract" operand to "accum" before the operation occurred, similar to the
way the "usual arithmetic conversions" work for other C types.
Unfortunately, the automatic conversions we added were a mistake.
The reason they're a mistake is similar to the reason we agreed not to
support "usual arithmetic conversions" from integers to fixed-point. As
we discussed in Copenhagen, for integer operands, such conversions would
make it impossible to perform a meaningful multiplication or division
between an integer and a "fract" fixed-point value, because automatic
conversion of the integer operand to the "fract" type would be guaranteed
to overflow unless the integer happened to be 0 or -1. To avoid this
gratuitous overflow, the current draft defines an arithmetic operation
between an integer and a fixed-point value as occuring directly on the
two types as they are, without first converting the integer operand to
match the type of the fixed-point operand.
Now consider what happens when an "accum" is multiplied by a "long
fract", both fixed-point types. Let's assume for argument's sake that
"accum" has format s16.15 (sign bit + 16 integral bits + 15 fractional
bits) and "long fract" is s.31 (sign bit + 31 fractional bits). Of
course, there's nothing strange about the desire to multiply a 32-bit
"accum" value by a 32-bit fraction between -1 and 1, generating an
"accum" result; and furthermore, any hardware capable of a "long fract"
multiplication (32 x 32 -> 32-bit fraction) can execute this "accum" x
"long fract" multiplication perfectly well. However, our "usual
arithmetic conversions" require that the "long fract" operand be
converted first to "accum" before the multiplication occurs, which
results in a drastic and gratuitous loss of 16 bits of significance in
the "long fract" operand, making the result far less accurate than it
deserves to be.
The current workaround is to cast the operands to an encompassing larger
type such as "long accum" and then cast (or assign) the "long accum"
result back to "accum". Problems with the workaround include:
* Given the new, relaxed rules proposed below for fixed-point formats,
there might not be an encompassing type. The "long accum" type is not
required to have as many fractional bits as "long fract" in our new
rules.
* The compiler is responsible for reducing this expression to a simple
32-bit fractional multiplication on the hardware. With rounding
effects, the full expression might not be identical to a single
multiplication, and the compiler might decline to perform the
optimization.
* The same solution using casts was offerred with BDTI's original
proposal which disallowed multiplication of "accum" and "long fract"
directly. This casting was rejected by the committee as being awkward
and nonintuitive. (Note, by the way, that, with the current system,
drastic loss of precision occurs quietly if the casts aren't included,
whereas questionable operations would have been flagged by the
compiler in the original proposal.)
We see two possible fixes:
* Restore the original prohibition against combining incompatible types
such as "accum" and "long fract"; or,
* Eliminate the automatic conversions and define the fixed-point
arithmetic operations as occurring directly on the two operands, as
already happens for operations between integers and fixed-point types.
In the text below, we've assumed the second solution, since the committee
has already voted against the first one.
- Relax the accuracy requirements slightly for multiplication and division.
The current Embedded C draft (following BDTI's proposal) requires in most
cases that a fixed-point operation occur with a rounding error of less
than 1 ulp. (An "ulp" is a "unit in the last place", equal to 2^-F where
F is the number of fractional bits in the destination format.) We're now
convinced this is too strict for multiplications on some implementations,
and so we're suggesting instead a maximum rounding error of less than
2 ulps for multiplication and divisions.
One important beneficiary would be mainstream 32-bit processors, on which
"long fract" might reasonably be implemented as 32 bits (format s.31).
A multiplication of two 32-bit "long fract"s to a "long fract" result
would typically be compiled as a 32 x 32 -> 64-bit integer multiplication
followed by a shift right by 31 bits, keeping only the bottom 32 bits
at the end. On many of these processors, the 64-bit product would be
obtained in two 32-bit registers---say, R0 and R1---and then the 31-bit
shift across the register pair would take three instructions:
shift R0 left 1 bit
shift R1 right 31 bits (an unsigned shift)
OR R1 into R0
which leaves the 32-bit "long fract" result in R0. But note that the
most significant 31 bits of the result are already available in R0
after the first shift; the other two instructions serve only to move the
last, least significant bit into position. If the product is permitted
to be up to 2 ulps in error, an implementation could choose instead
to leave the least significant bit zero and dispense with the last two
instructions. Although we'd prefer the tighter 1-ulp bound in principle,
savings such as this will be significant enough on many processors to
justify the greater leniency.
As a side effect of this change, the special case about multiplication
results of 1 and -1 could be dropped from the Embedded C document.
Other comments we have:
- The statement
A conforming implementation shall support at least two different
signed "fract" fixed point datatypes, and one signed accum fixed
point datatype.
in Section 2.1.1 is inappropriate. An implementation should be required
to support all six signed fixed-point types, and probably all of the
unsigned fixed-point types, too. Just as for the integers, there's no
requirement that the types all be different formats.
- The entire discourse on "containers" should be dropped. Most of the
important notions are already covered under the topic of "representations
of types" in Section 6.2.6.1 of the C Standard. Issues specific to the
new fixed-point types should follow the model of Section 6.2.6.2 for
integer types. In particular, the encoding of a fixed-point type should
be divided into "padding bits", "fractional bits", "integral bits", and
"sign bit", analogous to the integer types in 6.2.6.2.
In the same vein, Sections 2.1.4.1.2 and 2.1.4.1.4 of the draft are
(I think) redundant with the existing Standard and should be deleted.
- In the "usual arithmetic conversions" in Section 2.1.3, the following
statement is out of place and should be deleted:
If the type of either of the operands has the sat qualifier, the
resulting type shall have the sat qualifier; if the type of either of
the operands has the modwrap qualifier, the resulting type shall have
the modwrap qualifier.
The "usual arithmetic conversions" are not concerned with the result type
of an operation.
- Regarding the renaming of "abs" to "fpabs": the abbreviatin "f.p." is
often taken to mean "floating-point", so that may not be a good choice.
In any event, one or both of "roundfx" and "fpabs" should be renamed to
be consistent.
- Since the committee at Copenhagen rejected BDTI's proposal to have the
"bits" construct be assignable, we are now proposing a "setbits" operator
to serve this purpose. The rejected form
bits(x) = expr
would instead become
setbits(x, expr)
While the Embedded C draft calls "bits" a type-generic function,
"setbits" will either have to be a keyword or a macro in order to work
properly.
- The "additional information and rationale" for fixed-point in Section A.1
needs work, especially as some of it contradicts the more normative part
of the document.
We also agree that there are other issues (such as "printf" format
specifiers, or the possibility of complex fixed-point types) that need to
be considered but that haven't been addressed here. For now, our interest
is concentrated on getting concensus for the core fixed-point types and
operations before we widen our scope to include other topics.
============================================================================
The following constitutes more specific normative changes we propose for the
draft document.
----------------------------------------------------------------------------
2.1.1 The datatypes
To fix a number of problems with the draft and to provide a more flexible
set of type rules, replace Section 2.1.1 with the following (or something
similar):
/-------------------------------------------------------------------------\
Twelve new fixed-point types are defined:
unsigned short fract unsigned short accum
unsigned fract unsigned accum
unsigned long fract unsigned long accum
signed short fract signed short accum
signed fract signed accum
signed long fract signed long accum
The names
short fract short accum
fract accum
long fract long accum
without either "unsigned" or "signed" are aliases for the corresponding
signed fixed-point types.
The fixed-point types are assigned a fixed-point rank. The following
types are listed in order of increasing rank:
short fract, fract, long fract, short accum, accum, long accum
Each unsigned fixed-point type has the same size (in bytes) and the same
rank as it's corresponding signed fixed-point type.
The bits of an unsigned fixed-point type are divided into padding bits,
fractional bits, and integral bits. The bits of a signed fixed-point type
are divided into padding bits, fractional bits, integral bits, and a sign
bit.
The "fract" types have no integral bits; consequently, unsigned "fract"
types encode values in the range of 0 to 1, and signed "fract" types
encode values in the range of -1 to 1. The minimal formats for each type
are:
signed short fract s.7 signed short accum s4.7
signed fract s.15 signed accum s4.15
signed long fract s.23 signed long accum s4.23
unsigned short fract .7 unsigned short accum 4.7
unsigned fract .15 unsigned accum 4.15
unsigned long fract .23 unsigned long accum 4.23
(For the unsigned formats, the notation "x.y" means x integral bits and
y fractional bits, for a total of x + y bits. The added "s" in the signed
formats denotes the sign bit.)
An implementation may give any of the fixed-point types more fractional
bits, and may also give any of the "accum" types more integral bits,
subject to the following restrictions:
- Each unsigned "fract" type has either the same number of fractional
bits or one more fractional bit than its corresponding signed "fract"
type.
- The number of fractional bits is nondecreasing for each of the
following sets of fixed-point types when arranged in order of
increasing rank:
* signed "fract" types
* unsigned "fract" types
* signed "accum" types
* unsigned "accum" types
- The number of integral bits is nondecreasing for each of the following
sets of fixed-point types when arranged in order of increasing rank:
* signed "accum" types
* unsigned "accum" types
- Each signed "accum" type has at least as many integral bits as its
corresponding unsigned "accum" type.
Furthermore, the following are recommended practice where practical:
- The "signed long fract" type has at least 31 fractional bits.
- Each "accum" type has at least 8 integral bits.
- Each unsigned "accum" type has the same number of fractional bits as
its corresponding unsigned "fract" type.
- Each signed "accum" type has the same number of fractional bits as
either its corresponding signed "fract" type or its corresponding
unsigned "fract" type.
\-------------------------------------------------------------------------/
By way of example, these tables show the fixed-point formats we would
suggest for various classes of processors:
signed fract--------- signed accum---------
short middle long short middle long
typical desktop processor s.7 s.15 s.31 s8.7 s16.15 s32.31
typical 16-bit DSP s.15 s.15 s.31 s8.15 s8.15 s8.31
typical 24-bit DSP s.23 s.23 s.47 s8.23 s8.23 s8.47
Intel MMX s.7 s.15 s.31 s8.7 s16.15 s32.31
PowerPC AltiVec s.7 s.15 s.31 s8.7 s16.15 s32.31
Sun VIS s.7 s.15 s.31 s8.7 s16.15 s32.31
MIPS MDMX s.7 s.15 s.31 s8.7 s8.15 s17.30
Lexra Radiax s.7 s.15 s.31 s8.7 s8.15 s8.31
ARM Piccolo s.7 s.15 s.31 s8.7 s16.15 s16.31
unsigned fract------- unsigned accum-------
short middle long short middle long
typical desktop processor .8 .16 .32 8.8 16.16 32.32
typical 16-bit DSP .16 .16 .32 8.16 8.16 8.32
typical 24-bit DSP .24 .24 .48 8.24 8.24 8.48
Intel MMX .8 .16 .32 8.8 16.16 32.32
PowerPC AltiVec .8 .16 .32 8.8 16.16 32.32
Sun VIS .8 .16 .32 8.8 16.16 32.32
MIPS MDMX .8 .16 .32 8.8 8.16 16.32
Lexra Radiax .8 .16 .32 8.8 8.16 8.32
ARM Piccolo .8 .16 .32 8.8 16.16 16.32
(The "typical" DSPs referred to in the table cannot address units in memory
smaller than 16 or 24 bits, which is why these processors aren't expected to
support a "short fract" smaller than "fract".)
----------------------------------------------------------------------------
2.1.3 Type conversions, usual arithmetic conversions
To suppress most of the "usual arithmetic conversions" for fixed-point
types, replace the four rules in the text with the following:
Otherwise, if one operand has fixed-point type and the other operand has
integer type, then no conversions are needed.
Otherwise, if both operands have signed fixed-point types, or if both
operands have unsigned fixed-point types, then no conversions are
needed.
Otherwise, if one operand has signed fixed-point type and the other
operand has unsigned fixed-point type, the operand with unsigned type
is converted to the signed fixed-point type corresponding to its own
unsigned fixed-point type.
In any event, delete the rule
If the type of either of the operands has the sat qualifier, the
resulting type shall have the sat qualifier; if the type of either of
the operands has the modwrap qualifier, the resulting type shall have
the modwrap qualifier.
----------------------------------------------------------------------------
2.1.4.1.2 Address and indirection operators
Delete this section.
----------------------------------------------------------------------------
2.1.4.1.4 The "sizeof" operator
Delete this section.
----------------------------------------------------------------------------
2.1.4.2.1 Binary arithmetic operators
To suppress most of the "usual arithmetic conversions" for fixed-point
types, replace the second bullet point with:
- Otherwise if both operands are fixed-point, the result type is the
operand type with greater rank (after the usual arithmetic conversions
have been applied), with the adoption of any "sat" or "modwrap"
qualifier from either operand. (For example, if the operands of an
addition have types "unsigned long accum" and "sat fract", the result
type is "sat long accum".) It is a constraint error for one operand
to have a "sat" qualifier and the other a "modwrap" qualifier.
To relax the accuracy requirement for multiplication and division, replace
the text starting with "However, if the mathematical result of ..." with the
following:
For arithmetic operators other than "*" and "/", this rounded result is
returned as the result of the operation. The "*" and "/" operators may
return either this rounded result or, alternatively, the closest larger
or closest smaller value representable by the result fixed-point type.
The circumstances in which the rounded result might be replaced by a
neighboring value in this manner are implementation-defined. (Between
rounding and this optional adjustment, the multiplication and division
operations permit a mathematical error of almost 2 units in the last
place of the result type.)
It should be stated that "division by zero is undefined".
----------------------------------------------------------------------------
2.1.5.1 The "roundfx" function
2.1.5.4 The "fpabs" function
Rename one or both of these functions to be consistent with one another.
----------------------------------------------------------------------------
2.1.5.3 The "bits" function
Append the following text for "setbits":
The opposite operation of setting the bits of a fixed-point variable is
provided by the "setbits" macro, which has the syntax
setbits ( <fixed-lvalue>, <n> )
For example, starting with the declaration of "a" above, the assignment
setbits(a, 0x2000);
gives "a" the fixed-point value of 0.25. The value of the second
operand is converted to an integer before the assignment is made. If
this integer value is too large for the type of the first operand, only
the bottom N bits of the value are used, where N is the total number of
(nonpadding) bits of the fixed-point type.
----------------------------------------------------------------------------