JTC1/SC22/WG14 N808

    WG14/N808 J11/98-007
    
    
    DR 63 (and 56): Floating-Point accuracy
    
    Proposed wording for the Response for DR 63 (and 56). 
    
    The following wording should be incorporated into a future
    version of the standard in section 5.2.4.2.2 <float.h>, page
    25, between paragraphs 3 and 4:
    
    The accuracy of the floating-point operations (+, -, *, /)
    and of the math library (<math.h> and <complex.h>) functions
    that return a floating-point result is implementation
    defined.  The implementation shall document the accuracy as
    the worst case error in terms of units in the last place
    (ULPs), or it shall say that the accuracy is "unknown." 
    
    
    Rationale (section 5.2.4.2.2)
    
    In documenting the accuracy of the math library functions,
    an implementation is allowed to break the domain of a
    function into disjoint regions and document the worst case
    error in each region.  An implementation should document the
    accuracy of the complex operations and functions by listing
    the worst case error in both the real and the imaginary
    parts. 
    
    The error between two floating-point numbers (such as
    computed and infinitly precise) in ULPs (Unit in the Last
    Places) depends on the radix and the precision used in
    representing the number, but not the exponent.  Thus, the
    ULP of a value x, with exponent e, precision p, and radix r,
    is r**(1-p).  With a decimal radix and 3 digits of
    precision, the computed value 3.14e-2 differs from the value
    3.1416e-2 by 0.16 ULPs. 
    
    The C9X committee discussed the idea of allowing the
    programmer to find out the accuracy of floating-point
    operations and math functions during compilation (say via
    macro symbols) or during execution (via a function call),
    but neither got enough support (even though this is what
    some users want) to warrant the change to the standard. 
    Either would require over one hundred macro symbols to name
    every math function (such as ULP_SINF, ULP_SIN, and ULP_SIND
    just for the sin function) and that was a large change for a
    small utility. 
    
    The values in <float.h> should be in terms of the hardware
    representation used to store floating-point values (not in
    terms of the effective accuracy of operations).  That is, a
    poor quality implementation cannot say (in <float.h>) that
    it has less precision than the hardware representation, so
    that it could claim 1-ULP accuracy for its operations. 
    
    The committee could not agree on upper limits on accuracy
    that all conforming imlementations must meet, eg, addition
    is no worst than 2 ULPs for all implementations.  It is a
    quality of implementation issue. 
    
    Implementations that conform to IEC-559 have 0.5 ULP
    accuracy in round to nearest mode, and 1.0 ULP accuracy in
    the other three rounding modes, for the basic arithmetic
    operations and sqrt.  For other floating-point arithmetics,
    it is a rare implementation that has worse than 1-ULP
    accuracy for the basic arithmetic operations. 
    
    The accuracy of binary-decimal converions and format
    conversions are discussed elsewhere in the standard. 
    
    For the math library functions, currently, fast correctly
    rounded 0.5 ULP accuracy is still a research problem.  Some
    imlementation provide two math libraries: one being fast and
    sloppy, the other being slow and accurate. 
    
    Some math functions, such as those that do argument
    reduction modulo an approximation of pi, have good accuracy
    for small arguments, and poor accuracy for large arguments. 
    It is not unusual for an implementation of the trig
    functions to have zero bits correct in the computed result
    for large arguments.  That is why the committee allows an
    implementation to break the domain of the function into
    disjoint sections and specify the worst case accuracy in
    each section. 
    
    The value of an ULP of a positive number x that is a power
    of the radix is the difference between the next
    representable number (in the model) larger than x and x,
    rather than the difference between x and the closest
    representable number smaller than x.  These two potential
    definitions of an ULP's value differ by a factor of the
    radix.