2 General
2.1 Scope
2.2 References
4 Characteristics of decimal floating types <decfloat.h>
5 Conversions
5.1 Conversions between
decimal floating and integer
5.2 Conversions among decimal
floating types, and between decimal float types and non-decimal floating
types
5.3 Conversions
between decimal floating and complex
5.4 Usual arithmetic conversions
5.5 Default argument
promotion
7 Floating-point environment <fenv.h>
8 Arithmetic operations
8.1 Operators
8.2 Functions
8.3 Conversions
9 Library
9.1 Decimal mathematics <math.h>
9.2 New functions
9.2.1 divide_integer functions
9.2.2 remainder_near functions
9.2.3 quantizie functions
9.2.4 round_to_integer functions
9.2.5 normalize functions
9.3 Formatted input/output
specifiers
9.4 strtod32, strtod64, and strtod128 functions
<stdlib.h>
9.5 wcstod32, wcstod64, and wcstod128 functions
<wchar.h>
9.6 Type-generic macros <tgmath.h>
However, human computation and communication of numeric values almost always uses decimal arithmetic and decimal notations. Laboratory notes, scientific papers, legal documents, business reports and financial statements all record numeric values in decimal form. When numeric data are given to a program or are displayed to a user, binary to-and-from decimal conversion is required. There are inherent rounding errors involved in such conversions; decimal fractions cannot, in general, be represented exactly by binary floating-point values. These errors often cause usability and efficiency problems, depending on the application.
These problems are minor when the application domain accepts, or requires results to have, associated error estimates (as is the case with scientific applications). However, in business and financial applications, computations are either required to be exact (with no rounding errors) unless explicitly rounded, or be supported by detailed analyses that are auditable to be correct. Such applications therefore have to take special care in handling any rounding errors introduced by the computations.
The most efficient way to avoid conversion error is to use decimal arithmetic. Currently, the IBM z-architecture (and its predecessors since System/360) is a widely used system that supports builtin decimal arithmetic. This, however, provides integer arithmetic only, meaning that every number and computation has to have separate scale information preserved and computed in order to maintain the required precision and value range. Such scaling is difficult to code and is error-prone; it affects execution time significantly, and the resulting program is often difficult to maintain and enhance.
Even though the hardware may not provide decimal arithmetic operations, the support can still be emulated by software. Programming languages used for business applications either have native decimal types (such as PL/I, COBOL, C#, or Visual Basic) or provide decimal arithmetic libraries (such as the BigDecimal class in Java). The arithmetic used, nowadays, is almost invariably decimal floating-point; the COBOL 2002 ISO standard, for example, requires that all standard decimal arithmetic calculations use 32-digit decimal floating-point.
At present, all languages use software for decimal arithmetic. Even the best packages are slow, and can be 100 times slower than a corresponding hardware implementation, and in some cases much slower. At least one processor manufacturer, therefore, is adding decimal floating-point in hardware.
Arguably, the C language hits a sweet spot within the wide range of programming languages available today  it strikes an optimal balance between usability and performance. Its simple and expressive syntax makes it easy to program; and its close-to-the-hardware semantics makes it efficient. Despite the advent of newer programming languages, C is still often used together with other languages to code the computationally intensive part of an application. In many cases, entire business applications are written in C/C++. To maintain the vitality of C, the need for decimal arithmetic by the business and financial community cannot be ignored.
The importance of this has been recognized by the IEEE. The IEEE 754 standard is currently being revised, and the major change in that revision is the addition of decimal floating-point formats and arithmetic. These decimal data types are almost as efficient as the binary types, and are especially suitable for hardware implementation; it is possible that they will become the most widely used primitive data types once hardware implementations are available.
Historically there has been a close tie between IEEE-754 and C with
respect to floating-point specification. With the revised IEEE-754 nearing
the final approval stage, it is now the appropriate time for C to consider
adding decimal types and arithmetic to its specification.
There are three components to the model:
The model defines these components in the abstract. It neither defines the way in which operations are expressed (which might vary depending on the computer language or other interface being used), nor does it define the concrete representation (specific layout in storage, or in a processor's register, for example) of numbers or context.
- numbers - which represent the values which can be manipulated by, or be the results of, the core operations defined in the model
- operations - the core operations (such as addition, multiplication, etc.) which can be carried out on numbers
- context - which represents the user-selectable parameters and rules which govern the results of arithmetic operations (for example, the rounding mode to be used)
From the perspective of the C language, numbers are represented
by data types, operations are defined within expressions, and context
is the floating environment specified in fenv.h. This Technical Report
specifies how the C language implements these components.
Note: A description of the arithmetic model can be found in
http://www2.hursley.ibm.com/decimal/decarith.html.
Note: A description of the encodings can be found in http://www2.hursley.ibm.com/decimal/decbits.html.
C99 specifies floating-point arithmetic using a two-layer organization. The first layer provides a specification using an abstract model. The representation of floating-point number is specified in an abstract form where the constituent components of the representation is defined (sign, exponent, significand) but not the internals of these components. In particular, the exponent range, significand size and the base (or radix), are implementation defined. This allows flexibility for an implementation to take advantage of its underlying hardware architecture. Furthermore, certain behaviors of operations are also implementation defined, for example in the area of handling of special numbers and in exceptions.
The reason for this approach is historical. At the time when C was first standardized, there were already various hardware implementations of floating-point arithmetic in common use. Specifying the exact details of a representation would make most of the existing implementations at the time not conforming.
C99 provides a binding to IEEE-754 by specifying an annex F and adopting that standard by reference. An implementation not conforming to IEEE-754 can choose to do so by not defining the macro __STDC_IEC_559__. This means not all implementations need to support IEEE-754, and the floating-point arithmetic need not be binary.
This Technical Report specifies decimal floating-point arithmetic
according to the IEEE-754R, with the constituent components
of the representation defined. This is more stringent than the existing
C99 approach for the floating types. Since it is
expected that all decimal floating-point hardware implementations will
conform to the revised IEEE 754, binding to this standard directly
benefits both implementators and programmers.
This Technical Report does not specify binary
floating-point arithmetic.
2.2.1 ISO/IEC 9899:1999, Information technology - Programming languages, their environments and system software interfaces - Programming Language C.
2.2.1.1 ISO/IEC 9899:1999, Technical Corrigendum 1 to Programming Language C.
2.2.2 ANSI/IEEE 754-1985 - IEEE Standard for Binary Floating-Point Arithmetic. The Institute of Electrical and Electronic Engineers, Inc., New York, 1985.
2.2.2.1 The IEEE 754 revision working group is currently revising the specification for floating-point arithmetic:
ANSI/IEEE 754R - IEEE Standard for Floating-Point
Arithmetic. The Institute of Electrical and Electronic Engineers, Inc.
Draft.
2.2.3 ANSI/IEEE 854-1987 - IEEE Standard for Radix-Independent Floating-Point Arithmetic. The Institute of Electrical and Electronic Engineers, Inc., New York, 1987.
2.2.4 A Decimal Floating-Point Specification, Schwarz, Cowlishaw,
Smith, and Webb, in the Proceedings of the 15th IEEE Symposium on Computer
Arithmetic (Arith 15), IEEE, June 2001.
Note: Reference materials relating to IEEE-754R
can be found in http://grouper.ieee.org/groups/754/ and http://www.validlab.com/754R/.
A single token is used as a type name to make it easy for C++ to implement the types as classes.
Within the type hierarchy, decimal floating types are base types, real types and arithmetic types.
The types float, double and long double are also called generic floating types for the purpose of this Technical Report.
Note: C does not specify a radix for float, double and long double. An implementation can choose the representation of float, double and long double to be the same as the decimal floating types. In any case, the decimal floating types are distinct from float, double and long double regardless of the representation.
Note: This Technical Report does not define decimal complex types. The three complex types remain to be float _Complex, double _Complex and long double _Complex.
Following are suggested change to the C99:
Change the first sentence of 6.2.5#10.
[10] There are three generic floating types, designated as float, double and long double.
Add the following paragraphs after 6.2.5#10.
[10a] There are three decimal floating types, designated as _Decimal32, _Decimal64 and _Decimal128. The set of values of the type _Decimal32 is a subset of the set of values of the type _Decimal64; the set of values of the type _Decimal64 is a subset of the set of values of the type _Decimal128. Support for _Decimal128 is optional. Decimal floating types are real floating types.
[10b] The generic floating types and decimal floating types are real floating types.
Add the following to 6.7.2 Type specifiers:
type-specifier:
_Decimal32
_Decimal64
_Decimal128
The characteristics of decimal floating types are defined in terms of a model specifying general decimal arithmetic (refer to 1.2). The encodings are specified in IEEE-754R (refer to 1.3).
The three decimal encoding formats defined in IEEE-754R correspond to the three decimal floating types as follows:
The finite numbers are defined by a sign, an exponent (which is a power of ten), and a decimal integer coefficient. The value of a finite number is given by (-1)sign x coefficient x 10exponent. Refer to IEEE-754R for details of the format.
- _Decimal32 is a decimal32 number, which is encoded in four consecutive bytes (32 bits)
- _Decimal64 is a decimal64 number, which is encoded in eight consecutive bytes (64 bits)
- _Decimal128 is a decimal128 number, which is encoded in 16 consecutive bytes (128 bits)
These formats are characterized by the length of the coefficient,
and
the maximum and minimum exponent. The table below shows these characteristics
by format:
Format | decimal32 | decimal64 | decimal128 |
Coefficient length in digits | 7 | 16 | 34 |
Maximum Exponent (Emax) | 96 | 384 | 6144 |
Minimum Exponent (Emin) | -95 | -383 | -6143 |
The new header <decfloat.h> defines several macros that expand to various limits and parameters of the decimal floating-types. These macros have the similar names and meaning as to the corresponding ones in <float.h>.
Suggested change to C99.
Add the following after 5.2.4.2.2:
5.2.4.2.2a Characteristics of decimal floating types <decfloat.h>
[1] The characteristics of decimal floating types are defined in terms of the format described in IEEE-754R. The finite numbers are defined by a sign, an exponent (which is a power of ten), and a decimal integer coefficient. The value of a finite number is given by (-1)sign x coefficient x 10exponent. The macros defined in decfloat.h provide the characteristics of these representations, which is defined in the Decimal Arithmetic Encoding. The prefixes DEC32_ , DEC64_, and DEC128_ are used to denote the types _Decimal32, _Decimal64, and _Decimal128 respectively.
[2] Except for assignment and casts, the values of operations with decimal floating operands and values subject to the usual arithmetic conversions and of decimal floating constants are evaluated to a format whose range and precision may be greater than required by the type. The use of evaluation formats is characterized by the implementation-defined value of DEC_EVAL_METHOD:
-1 indeterminable;All other negative values for DEC_EVAL_METHOD characterize implementation-defined behavior.
0 evaluate all operations and constants just to the range and precision of the type;
1 evaluate operations and constants of type _Decimal32 and _Decimal64 to the range and precision of the _Decimal64 type, evaluate _Decimal128 operations and constants to the range and precision of the _Decimal128 type;
2 evaluate all operations and constants to the range and precision of the _Decimal128 type.
[3] The values given in the following list shall be replaced by constant expressions suitable for use in #if preprocessing directives:
- number of digits in the coefficient
DEC32_COEFF_DIG 7
DEC64_COEFF_DIG 16
DEC128_COEFF_DIG 34
- minimum exponent
DEC32_MIN_EXP -95
DEC64_MIN_EXP -383
DEC128_MIN_EXP -6143
- maximum exponent
DEC32_MAX_EXP 96
DEC64_MAX_EXP 384
DEC128_MAX_EXP 6144
- maximum representable finite decimal floating number (there are 6, 15 and 33 9's after the decimal points respectively)
DEC32_MAX 9.999999E96DF
DEC64_MAX 9.999999999999999E384DD
DEC128_MAX 9.999999999999999999999999999999999E6144DL
- the difference between 1 and the least value greater than 1 that is representable in the given floating point type
DEC32_EPSILON 1E-6DF
DEC64_EPSILON 1E-15DD
DEC128_EPSILON 1E-33DL
- minimum normalized positive decimal floating number
DEC32_MIN 1E-95DF
DEC64_MIN 1E-383DD
DEC128_MIN 1E-6143DL
- minimum denormalized positive decimal floating number
DEC32_DEN 1E-101DF
DEC62_DEN 1E-398DD
DEC128_DEN 1E-6176DL
When the new type is a decimal floating type, we have these choices: the most positive/negative number representable, positive/negative infinity, and quiet NaN. The first provides no indication to the program that something exceptional has happened. The second provides indication, but other operations that produce infinity also raise signals. A signal needs to be raised here for consistency. But in the interest of performance, interupting the program is not preferable. The third allows the program to continue while providing a way for the implementation to encode the condition. This is slightly better than the second choice.
When the new type is an unsigned integral type, the values that create problems are those less than 0 and those greater than Utype_MAX. There is no overflow/under-flow processing for unsigned arithmetic. A possible choice for the result would be Utype_MAX. Also, common existing implementations do not raise signals for signed integer arithmetic. When the new type is a signed integral type, the values that create problems are those less than type_MIN and those greater than type_MAX. The result here could be type_MIN or type_MAX depending on whether the original value is negative or positive.
To make the behavior consistent among all
real floating types, the suggested changes below apply to all real floating
types, not just decimal floating types.
Suggested change to C99.
Change the last sentence of 6.3.1.4 paragraph 1 to:
[1] ... If the value of the integral part cannot be represented by the integer type, the result is the largest representable number if the type is unsigned, and the most negative or positive number according to the sign of the floating point number if the type is signed.
Change the last sentence of 6.3.1.4 paragraph 2 to:
[2] ... If the value being converted is outside
the range of values that can be represented, the result is quiet NaN.
Following are suggested change to C99:
Add after 6.3.1.5#2.
[3] When a _Decimal32 is promoted to _Decimal64 or _Decimal128, or a _Decimal64 is promoted to _Decimal128, its value is unchanged.
[4] When a _Decimal64 is demoted to _Decimal32, a _Decimal128 is demoted to _Decimal64 or _Decimal32, or conversion is performed among decimal and generic floating types other than the above, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is correctly rounded. If the value being converted is outside the range of values that can be represented, the result is dependent on the rounding mode. If the rounding mode is:
near, the absolute value of the result is one of HUGE_VAL, HUGE_VALF, HUGE_VALL, HUGE_VAL_D64, HUGE_VAL_D32 or HUGE_VAL_D128 depending on the result type and the sign is the same as the value being converted.zero, the value is the most positive respresentable if the value being converted is positive, and the most negative number representable otherwise.
positive infinity, the value is same as zero if the value being converted is negative, and is same as near otherwise.
negative infinity, the value is same as near if the value being converted is negative, and is same as zero otherwise.
This is covered by C99 6.3.1.7.
One major difficulty of allowing mixed operation is in the determination of the common type. C99 does not specify exactly the range and precision of the generic real types. The pecking order between them and the decimal types is therefore unspecified. Given two (or more) mixed type operands, there is no simple rule to define a common type that would guarantee portability in general.
For example, we can define the common type to be the one with greater range (the suggested change below). But since a double type may have different range under different implemenations, a program cannot assume the resulting type of an addition, say, involving both _Decimal64 and double. This imposes limitations on how to write portable programs.
If the generic real type is a type defined
in IEEE-754R, and if we use the greater-range rule, the common type
is easily determined. When mixing decimal and binary types of the same
type size, decimal type is the common type. When mixing types of different
sizes, the common type is the one with larger size. The suggested change
below uses this approach but does not assume the generic real type to follow
IEEE-754R. This guaranttees consistent behaviors among implementation that
uses IEEE-754 in their binary floating-point arithmetic, and at the same
time provides reasonable behavior for those that don't. Annex C presents
an alternate suggestion that disallows mixed operands.
Following are suggested changes to C99.
Insert the following to 6.3.1.8#1, after "This pattern is called the usual arithmetic conversions:"
6.3.1.8[1]
... This pattern is called the usual arithmetic conversions:
If one operand is a decimal floating type and there are no complex types in the operands:
If one operand is a decimal floating type and the other is a generic floating type, the one with a smaller value range is converted to the other.Otherwise, if either operand is _Decimal128, the other operand is converted to _Decimal128.
Otherwise, if either operand is _Decimal64, the other operand is converted to _Decimal64.
Otherwise, if either operand is _Decimal32, the other operand is converted to _Decimal32.
If one operand is a decimal floating type and the other is a
complex type, the decimal floating type is converted to the first type
in the following list that can represent the value range: float, double,
long double. It is converted to long double if no type in the list can
represent its value range. In either case, the complex type is converted
to a type whose corresponding real type is this converted type. Usual arithmetic
conversions is then applied to the converted operands.
During any of the above conversions, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is correctly rounded. If the value being converted is outside the range of values that can be represented, the result is dependent on the rounding mode. If the rounding mode is:
near, the absolute value of the result is one of HUGE_VAL, HUGE_VALF, HUGE_VALL, HUGE_VAL_D64, HUGE_VAL_D32 or HUGE_VAL_D128 depending on the result type and the sign is the same as the value being converted.zero, the value is the most positive respresentable if the value being converted is positive, and the most negative number representable otherwise.
positive infinity, the value is same as zero if the value being converted is negative, and is same as near otherwise.
negative infinity, the value is same as near if the value being converted is negative, and is same as zero otherwise.
If there are no decimal floating type
in the operands:
First, if the corresponding real type of either operand is long double, the other operand is converted, ... <the rest of 6.3.1.8#1 remains the same>
Suggested change to C99.
Add the following to 6.4.4.2 floating-suffix.
floating-suffix: one off l F L df dd dl DF DD DL
Add the following paragraph after 6.4.4.2#2:
6.4.4.2
...
[2a]
Constraints
The df, dd, dl, DF, DD and DL shall not be used in a hexadecimal-floating-constant.
Add the following paragraph after 6.4.4.2#4:
6.4.4.2
...
[4a] If a floating constant is suffixed by df or DF,
it has type _Decimal32. If suffixed by dd or DD, it
has type _Decimal64. If suffixed by dl or DL, it has type
_Decimal128.
Suggested change to C99.
Add the following after 7.6#7:
7.6
...
[7a] Each of the macros
FE_DEC_ROUND_DOWN
FE_DEC_ROUND_HALF_UP
FE_DEC_ROUND_HALF_EVEN
FE_DEC_ROUND_CEILING
FE_DEC_ROUND_FLOOR
are defined and used by fegetround and fesetround
functions for getting and setting the rounding mode of decimal floating-pointer
operations.
[7b] Each of the macros
FE_DEC_ROUND_HALF_DOWN
FE_DEC_ROUND_UP
are defined and used by fegetround and fesetround
functions if and only if the implementation supports the optional rounding
modes
round-half-down and round-up.
Add the following paragraph after 7.6#5.
7.6
...
[5a] Each of the macros
FE_DEC_DIVISION_BY_ZERO
FE_DEC_INEXACT
FE_DEC_INVALID_OPERATION
FE_DEC_OVERLFOW
FE_DEC_UNDERFLOW
are defined and used by functions defined in C99 7.6.2.
Square root, min, max, fused multiple-add and remainder are
implemented as library functions. Refer to section 9 below.
Conversions between different formats and to integer formats
are covered under section 5.
The name of the functions are derived by adding suffixes d32,
d64
and d128 to the double version of the function name.
Suggested change to C99:
Add at the end of 7.12 paragraph 3 the following macros.
7.12
[3] ...
DEC32_HUGE
DEC64_HUGE
DEC128_HUGE
expands to a constant expression of type _Decimal32, _Decimal64
and _Decimal128 representing infinity.
Add at the end of 7.12 paragrah 5 the following macro.
7.12
[5] ...
DEC_NAN
expands to quiet decimal floating NaN for the type _Decimal32.
7.12.10.4 The divide integer functionsSynopsis
#include <math.h>Description
_Decimal32 divide_integerd32 (_Decimal32 x, _Decimal32 y);
_Decimal64 divide_integerd64 (_Decimal64 x, _Decimal64 y);
_Decimal128 divide_integerd128(_Decimal128 x, _Decimal128 y);The divide_integer functions perform the divide-integer operation as defined in IEEE 754R.
Suggested addition to C99:
7.12.10.5 The remainder near functionsSynopsis
#include <math.h>
_Decimal32 remainder_neard32 (_Decimal32 x, _Decimal32 y);
_Decimal64 remainder_neard64 (_Decimal64 x, _Decimal64 y);
_Decimal128 remainder_neard128(_Decimal128 x, _Decimal128 y);
DescriptionThe remainder_near functions perform the remainder-near operation as defined in IEEE 754R.
7.12.11.5 The quantize functionsSynopsis
#include <math.h>
_Decimal32 quantized32 (_Decimal32 x, _Decimal32 y);
_Decimal64 quantized64 (_Decimal64 x, _Decimal64 y);
_Decimal128 quantized128(_Decimal128 x, _Decimal128 y);_Bool check_quantum32 (_Decimal32 x, _Decimal32 y);
_Bool check_quantum64 (_Decimal64 x, _Decimal64 y);
_Bool check_quantum128 (_Decimal128 x, _Decimal128 y);
DescriptionThe quantize functions perform the quantize operation as defined in IEEE 754R.
Suggested addition to C99:
7.12.11.6 The round to integral functionsSynopsis
#include <math.h>
_Decimal32 round_to_integerd32 (_Decimal32 x, _Decimal32 y);
_Decimal64 round_to_integerd64 (_Decimal64 x, _Decimal64 y);
_Decimal128 round_to_integerd128(_Decimal128 x, _Decimal128 y);
DescriptionThe round_to_integer functions perform the round-to-integer operation as defined in IEEE 754R.
7.12.15 The normalize functionsSynopsis
#include <math.h>
_Decimal32 normalized32 (_Decimal32 x);
_Decimal64 normalized64 (_Decimal64 x);
_Decimal128 normalized128 (_Decimal128 x);
DescriptionThe normalize functions perform the normalize operation as defined in IEEE 754R.
Similarly, the modifier D and LD can be appended to f, F, e, E,
g, and G to form input specifiers that indicate the argument is a pointer
to _Decimal32 or _Decimal128 respectively. In addition, the modifier HD
can be appended to f, F, e, E, g, and G to form input specifiers that indicate
the argument is a pointer to _Decimal32.
Synopsis
#include <stdlib.h>_Decimal32 strtod32 (const char * restrict nptr, char ** restrict endptr);
_Decimal64 strtod64 (const char * restrict nptr, char ** restrict endptr);
_Decimal128 strtod128(const char * restrict nptr, char ** restrict endptr);
Synopsis
#include <wchar.h>_Decimal32 wcstod32 (const char * restrict nptr, char ** restrict endptr);
_Decimal64 wcstod64 (const char * restrict nptr, char ** restrict endptr);
_Decimal128 wcstod128(const char * restrict nptr, char ** restrict endptr);
If there is more than one arguments, usual arithmetic conversions are applied so that both arguments have compatible types. Then,
Below is the suggested text for strtod32,
strtod64, and strtod128, copied from C99 7.20.1.3 with editing. Editing
is indicated by strikethrough (delete) and underline (change, new). Refer
also to the handling of Signalling NaNs suggested by WG14 paper N1011.
7.20.1.5 The strtod32, strtod64, and strtod128 functions
Synopsis
[#1]
#include <stdlib.h>
_Decimal32 strtod32
(const char * restrict nptr, char ** restrict endptr);
_Decimal64 strtod64
(const char * restrict nptr, char ** restrict endptr);
_Decimal128 strtod128(const
char * restrict nptr, char ** restrict endptr);
Description
[#2] The strtod32, strtod64, and strtod128
functions convert the initial portion of the string pointed to by nptr
to float_Decimal32, double _Decimal64,
and long double _Decimal128 representation, respectively.
First, they decompose the input string into three parts: an initial, possibly
empty, sequence of white-space characters (as specified by the isspace
function), a subject sequence resembling a floating-point constant
or representing an infinity or NaN; and a final string of one or more unrecognized
characters, including the terminating null character of the input string.
Then, they attempt to convert the subject sequence to a floating-point
number, and return the result.
[#3] The expected form of the subject sequence is an optional plus or minus sign, then one of the following:
n-char-sequence:
The length of the n-char-sequence shall be shorter than D32_COEFF_DIG, D64_COEFF_DIG or D128_COEFF_DIG respectively depending on the return type. The subject sequence is defined as the longest initial subsequence of the input string, starting with the first non-white-space character, that is of the expected form. The subject sequence contains no characters if the input string is not of the expected form.
[#4] If the subject sequence has the expected
form for a floating-point number, the sequence of characters starting with
the first digit or the decimal-point character (whichever occurs first)
is interpreted as a floating constant according to the rules of 6.4.4.2,
except that it is not a hexadecimal floating number, that the decimal-point
character is used in place of a period, and that if neither an exponent
part nor a decimal-point character appears in a decimal floating point
number, or if a binary exponent part does not appear in a hexadecimal
floating point number, an exponent part of the appropriate type
with value zero is assumed to follow the last digit in the string. If the
subject sequence begins with a minus sign, the sequence is interpreted
as negated. note1) A character sequence INF or INFINITY is interpreted
as an infinity, if representable in the return type, else like
a floating constant that is too large for the range of the return type.
A character sequence NAN or NAN(n-char-sequence-opt),
or SNAN or SNAN(n-char-sequence-opt),
is interpreted as a quiet NaN or signalling NaN respectively; the meaning
of the n-char sequences is implementation-defined. note2) A pointer
to the final string is stored in the object pointed to by endptr, provided
that endptr is not a null pointer.
[#5] If the subject sequence
has the hexadecimal form and FLT_RADIX is a power of 2, the The
value resulting from the conversion is correctly rounded.
[#6] In other than the "C" locale, additional locale-specific subject sequence forms may be accepted.
[#7] If the subject sequence is empty or does not have the expected form, no conversion is performed; the value of nptr is stored in the object pointed to by endptr, provided that endptr is not a null pointer.
Recommended practice
[#8] If the subject
sequence has the hexadecimal form, FLT_RADIX is not a power
of 2, and the result is not exactly representable, the result
should be one of the two numbers in the appropriate internal format that
are adjacent to the hexadecimal floating source
value, with the extra stipulation that the
error should have a correct sign for the current rounding
direction.
[#9] If the subject
sequence has the decimal form and at most DECIMAL_DIG (defined
in <float.h>)DEC128_COEFF_DIG (defined in <decfloat.h>)
significant digits, the result should be correctly rounded. If the subject
sequence D has the decimal form and more than DEC128_COEFF_DIG
significant digits, consider the two bounding, adjacent decimal strings
L and U, both having DEC128_COEFF_DIG significant digits, such that the
values of L, D, and U satisfy L <= D <= U. The result should be one
of the (equal or adjacent) values that would be obtained by correctly rounding
L and U according to the current rounding direction, with the extra stipulation
that the error with respect to D should have a correct sign for the current
rounding direction. 252)
Returns
[#10] The functions return the converted
value, if any. If no conversion could be performed, zero is returned.
If the correct value is outside the range of representable values, plus
or minus HUGE_VALHUGE_VAL_D64, HUGE_VALFHUGE_VAL_D32,
or
HUGE_VALL HUGE_VAL_D128 is returned (according
to the return type and sign of the value), and the value of the macro ERANGE
is stored in errno. If the result underflows (7.12.1), the functions return
a value whose magnitude is no greater than the smallest normalized positive
number in the return type; whether errno acquires the value ERANGE is implementation-defined.
252 DECIMAL_DIG, defined in <float.h>,
should be sufficiently large that L and U will usually round to the same
internal floating value, but if not will round to adjacent values.
note1 It is unspecified whether a
minus-signed sequence is converted to a negative number directly or by
negating the value resulting from converting the corresponding unsigned
sequence (see F.5); the two methods may yield different results if
rounding is toward positive or negative infinity. In either case, the functions
honor the sign of zero if floating-point arithmetic supports signed zeros.
F.5 shall be followed.
note2 An implementation may use the n-char
sequence to determine extra information to be represented in the NaN's
significand. No signal is raised at the point of returning the signalling
NaN.
7.24.4.1.3 The strtod32, strtod64, and strtod128 functions
Synopsis
[#1]
#include <stdlib.h>
_Decimal32 strtod32
(const char * restrict nptr, char ** restrict endptr);
_Decimal64 strtod64
(const char * restrict nptr, char ** restrict endptr);
_Decimal128 strtod128(const
char * restrict nptr, char ** restrict endptr);
Description
Similar to 7.20.1.5 in annex A, replacing
references to character with wide character where appropiate.
Insert the following to 6.3.1.8#1, after "This pattern is called the usual arithmetic conversions:"
6.3.1.8[1]
... This pattern is called the usual arithmetic conversions:
If one operand is a decimal floating type, all other operands shall not be generic floating type or complex type:
First if either operand is _Decimal128, the other operand is converted to _Decimal128.Otherwise, if either operand is _Decimal64, the other operand is converted to _Decimal64.
Otherwise, if either operand is _Decimal32, the other operand is converted to _Decimal32.
If there are no decimal floating type
in the operands:
First, if the corresponding real type of either operand is long double, the other operand is converted, ... <the rest of 6.3.1.8#1 remains the same>