1. Overview
At the Issaquah 2016 meeting, during the LWG review of [P0452r0], it was realized that some of the numeric algorithms (both old algorithms and new ones from the Parallelism TS) had insufficient or unclear type requirements.
Consider:
vector<double> v; double d = accumulate(v.begin(), v.end(), 0);
What type is used to store the intermediary result during accumulation in the
above example? Is it T, the type of the initial value (which is int here)? Or
is it double, the value type of the InputIterators?
The answer: it is T, or int in the above example. For algorithms
which do not take an initial value (such as partial_sum and adjacent_difference) and instead write to an output sequence via an OutputIterator, the value type of the InputIterator is used. This may not
be the best or most consistent behavior, but it appears to have be the status
quo in major implementations.
However, the specification lacks clarity about which types are used for accumulation and the constraints on the template parameters to the algorithms. This paper identifies some of the issues and proposes potential solutions.
2. Existing <numeric> Algorithms
accumulate, inner_product, partial_sum and adjacent_difference are all
specified using similar "accumulator-style" wording:
accumulate(26.8.2 [accumulate] paragraph 1):Effects: Computes its result by initializing the accumulatoraccwith the initial valueinitand then modifies it withacc = acc + *ioracc = binary_op(acc, *i)for every iteratoriin the range[first, last)in order.Requires:Tshall meet the requirements ofCopyConstructible(Table 22) andCopyAssignable(Table 24) types. In the range[first, last],binary_opshall neither modify elements nor invalidate iterators or subranges.
inner_product(26.8.5 [inner.product] paragraph 1):Effects: Computes its result by initializing the accumulatoraccwith the initial valueinitand then modifying it withacc = acc + (*i1) * (*i2)oracc = binary_op1(acc, binary_op2(*i1, *i2))for every iteratori1in the range[first1, last1)and iteratori2in the range[first2, first2 + (last1 - first1))in order.Requires:Tshall meet the requirements ofCopyConstructible(Table 22) andCopyAssignable(Table 24) types. In the ranges[first1, last1]and[first2, first2 + (last1 - first1)]binary_op1andbinary_op2shall neither modify elements nor invalidate iterators or subranges.
partial_sum(26.8.6 [partial.sum] paragraph 1 and 4)Effects: For a non-empty range, the function creates an accumulatoraccwhose type isInputIterator's value type, initializes it with*first, and assigns the result to*result. For every iteratoriin[first + 1, last)in order,accis then modified byacc = acc + *ioracc = binary_op(acc, *i)and the result is assigned to*(result + (i - first)).Requires:InputIterator's value type shall be constructible from the type of*first. The result of the expressionacc + *iorbinary_op(acc, *i)shall be implicitly convertible toInputIterator's value type.accshall be writable (24.2.1) to the result output iterator. In the ranges[first, last]and[result, result + (last - first)]binary_opshall neither modify elements nor invalidate iterators or subranges.
adjacent_difference(26.8.11 [adjacent.difference] paragraph 1 and 2)Effects: For a non-empty range, the function creates an accumulatoraccwhose type isInputIterator's value type, initializes it with *first, and assigns the result to*result. For every iteratoriin[first + 1, last)in order, creates an object val whose type is InputIterator’s value type, initializes it with*i, computesval - accorbinary_op(val, acc), assigns the result to*(result + (i - first)), and move assigns fromvaltoacc.Requires:InputIterator's value type shall beMoveAssignable(Table 23) and shall be constructible from the type of*first.accshall be writable (24.2.1) to the result output iterator. The result of the expressionval - accorbinary_op(val, acc)shall be writable to the result output iterator. In the ranges[first, last]and[result, result + (last - first)],binary_opshall neither modify elements nor invalidate iterators or subranges.
The description of these four algorithms shares a similar structure:
-
Create an accumulator
accwhich is initialized withinitor*first. -
For (the) iterator(s) in the range(s) in order, modify
accby applying an update operation which takesaccand the dereferenced value(s) of the iterator(s) as arguments.
For the numeric algorithms which do not take an initial value (partial_sum and adjacent_difference), the accumulator acc's type is the InputIterator's
value type.
For the numeric algorithms which do take an initial value (accumulate and inner_product), it is not entirely clear what the type of the accumulator
should be, although it is strongly implied that it should be the type of the
initial value (T) not the InputIterator's value type. The return value of
these two algorithms is T, and the Requires: clauses for both algorithms
specify that T must be CopyConstructible and CopyAssignable, which would
only make sense if the accumulator is to be of type T. In libstdc++ and
libc++, the init parameter is used as the accumulator (e.g. the accumulator
type is T).
I suggest that we change accumulate and inner_product to
-
Specify that the accumulator type is
T. -
Require the result of the update expression for these algorithms (
acc + *iorbinary_op(acc, *i)foraccumulate, for example) be implicitly convertible toT.partial_sumandadjacent_differencealready have similar language requiring implicit convertibilty toInputIterator's value type.
3. New Parallel <numeric> Algorithms
For the parallel numeric algorithms (reduce, transform_reduce, inclusive_scan, exclusive_scan, transform_inclusive_scan and transform_exclusive_scan), things are a bit more complicated. All of these
algorithms are specified in terms of GENERALIZED_SUM or
GENERALIZED_NONCOMMUTATIVE_SUM:
26.2 [numeric.defns]DefineGENERALIZED_NONCOMMUTATIVE_SUM(op, a1, ..., aN)as follows:
a1whenNis1, otherwise
op(GENERALIZED_NONCOMMUTATIVE_SUM(op, a1, ..., aK), GENERALIZED_NONCOMMUTATIVE_SUM(op, aM, ..., aN))for anyKwhere1 < K + 1 = M ≤ N.DefineGENERALIZED_SUM(op, a1, ..., aN)asGENERALIZED_NONCOMMUTATIVE_SUM(op, b1, ..., bN)whereb1, ..., bNmay be any permutation ofa1, ..., aN.
Arbitrary partition of the reduction calculations at the implementation’s
discretion is allowed for GENERALIZED_NONCOMMUTATIVE_SUM algorithms
(exclusive_scan, inclusive_scan, transform_exclusive_scan and transform_inclusive_scan). In addition to arbitrary partitioning, GENERALIZED_SUM algorithms (reduce and transform_reduce) can arbitrarily
re-order the reduction calculations. All of the algorithms have overloads that
take an initial value of type T. For the GENERALIZED_SUM algorithms, all
overloads take an initial value of type T.
These algorithms present a conundrum for us. Should they:
-
break the precedence of their serial counterparts (
accumulate,inner_productandadjacent_difference) and use the value type of InputIterator instead ofTfor intermediate storage (the implied status quo of the working text, given the current definitions ofGENERALIZED_SUMandGENERALIZED_NONCOMMUTATIVE_SUM). -
OR should the definition of algorithms be changed to require conversions to
T, making theGENERALIZED_SUMalgorithms consistent with their serial counterparts?
The second option seems very messy. We’d have to change the definition of GENERALIZED_SUM to explicitly take the initial value and initial value type
into account. The second option would potentially force conversions to T that
would be unnecessary and undesirable. Additionally, what about the algorithms
which can take an initial value, but also have overloads which do not take one
(inclusive_scan and transform_inclusive_scan)? The first option seems best
and appears to be closest to the design intent.
I suggest that we change all the parallel numeric algorithms to:
-
Require that the type of the initial value (
T) beCopyConstructible(andCopyAssignable?). -
Require that InputIterator’s value type can be constructed from the type of the initial value (T).
-
Replace uses of
initin theGENERALIZED_SUMorGENERALIZED_NONCOMMUTATIVE_SUMexpression used to define the algorithms with a temporary of the InputIterator’s value type constructed from the initial value. -
Require that the result of the
GENERALIZED_SUMorGENERALIZED_NONCOMMUTATIVE_SUMexpression be implicitly convertible to the correct return type for the algorithm (T).
Additionally, for the parallel numeric algorithms which take OutputIterators, I
suggest the following changes:
-
Require that the result of the
GENERALIZED_SUMorGENERALIZED_NONCOMMUTATIVE_SUMexpression be writable (24.2.1) to the result output iterator.