| Document #: | P1897R3 |
| Date: | 2020-05-16 |
| Project: | Programming Language C++ SG1 |
| Reply-to: |
Lee Howes <lwh@fb.com> |
just_via to just_on.via to on.ensure_started.indexed_for in Prague. Removed indexed_for from the paper of initial algorithms.let_value.handle_error wording to be more similar to let_value, and renamed let_error for naming consistency.sync_wait to terminate on done rather than throwing.on and just_on.just_via algorithm to allow type customization at the head of a work chain.when_all to fill missing gap in the ability to join sender chains.indexed_for based on feedback during the Belfast meeting to have a side-effecting algorithm.bulk_execute with indexed_for for the Prague meeting.is_noexcept_sender.just_error.on in addition to via in the final example.In [P0443R13] we have included the fundamental principles described in [P1660R0], and the fundamental requirement to customize algorithms. In recent discussions we have converged to an understanding of the submit operation on a sender and its more fundamental primitives connect and start supporting general interoperation between algorithms, and algorithm customization giving us full flexibility to optimize, to offload and to avoid synchronization in chains of mutually compatible algorithm customizations.
As a starting point, in [P0443R13] we only include a bulk_execute algorithm, that satisfies the core requirement to provide scalar and bulk execution. To make the C++23 solution completely practical, we should extend the set of algorithms, however. This paper suggests an expanded initial set that enables early useful work chains. This set is intended to act as a discussion focus for us to discuss one by one, and to analyze the finer constraints of the wording to make sure we do not over-constrain the design.
This paper does not attempt to propose the mapping of the C++20 parallel algorithms into an asynchronous environment. Once we have basic primitives, we can describe default implementations of the parallel algorithms, as well as std::async, in terms of these.
In the long run we expect to have a much wider set of algorithms, potentially covering the full set in the current C++20 parallel algorithms. The precise customization of these algorithms is open to discussion: they may be individually customized and individually defaulted, or they may be optionally individually customized but defaulted in a tree such that customizing one is known to accelerate dependencies. It is open to discussion how we achieve this and that is an independent topic, beyond the scope of this paper.
Starting with [P0443R13] as a baseline we have the following customization points:
connect(sender, receiver) -> operation_statestart(operation_state) -> voidsubmit(sender, receiver) -> voidschedule(scheduler) -> senderexecute(executor, invocable) -> voidset_doneset_errorset_valueand the following Concepts:
schedulerreceiverreceiver_ofsendersender_totyped_senderoperation_stateexecutorexecutor_ofWe propose immediately discussing the addition of the following algorithms:
just(v...)
sender of the value(s) v...just_on(sch, v...)
on algorithmon(s, sch)
s on sch’s execution contextsync_wait(s)
T of the value type of the sender, throwing on error, and terminates on done.when_all(s...)
s... complete, propagating all values transform(s, f)
f to the value passed by s, or propagates errors or cancellation let_value(s, f)
s is available for the duration of another async operation produced by f. Error and cancellation signals propagate unmodified.let_error(s, f)
s is available for the duration of another async operation produced by f. Value and cancellation propagate unmodified.ensure_started(s)
s and returns a sender that may be executing concurrently with surrounding code.A very simple example of applying a function to a propagated value and waiting for it.
auto just_sender = // sender_to<int>
just(3);
auto transform_sender = // sender_to<float>
transform(
std::move(just_sender),
[](int a){return a+0.5f;});
int result = // value: 3.5
sync_wait(std::move(transform_sender));In this very simple example we:
float value 3.5 to result.Using operator| as in ranges to remove the need to pass arguments around, we can represent this as:
when_all joins a list of incoming senders, propagating their values.
auto just_sender = // sender_to<int>
just(std::vector<int>{3, 4, 5}, 10);
auto just_float_sender = // sender_to<int>
just(20.0f);
auto when_all_sender = when_all(
std::move(just_sender), std::move(just_float_sender));
auto transform_sender(
std::move(when_all_sender),
[](std::vector<int> vec, int /*i*/, float /*f*/) {
return vec;
})
vector<int> result = // value: {3, 4, 5}
sync_wait(std::move(transform_sender));This demonstrates simple joining of senders:
resultUsing operator| as in ranges to remove the need to pass arguments around, we can represent this as:
vector<int> result_vec = sync_wait(
when_all(just(std::vector<int>{3, 4, 5}, 10), just(20.0f)) |
transform([](vector<int> vec, int /*i*/, float /*f*/){return vec;}));A simple example showing how an exception that leaks out of a transform may propagate and be thrown from sync_wait.
int result = 0;
try {
auto just_sender = just(3);
auto on_sender = on(std::move(just_sender), scheduler1);
auto transform_sender = transform(
std::move(on_sender),
[](int a){throw 2;});
auto skipped_transform_sender = transform(
std::move(transform_sender).
[](){return 3;});
result = sync_wait(std::move(skipped_transform_sender));
} catch(int a) {
result = a; // Assign 2 to result
}In this example we:
33, but this transform throws an exception rather than returning a transformed valueAs before, using operator| as in ranges to remove the need to pass arguments around, we can represent this more cleanly:
int result = 0;
try {
result = sync_wait(
just(3) |
on(scheduler1) |
transform([](int a){throw 2;}) |
transform([](){return 3;}));
} catch(int a) {
result = a; // Assign 2 to result
}Very similar to the above, we can handle an error mid-stream
auto just_sender = just(3);
auto via_sender = on(std::move(just_sender), scheduler1);
auto transform_sender = transform(
std::move(via_sender),
[](int a){throw 2;});
auto skipped_transform_sender = transform(
std::move(transform_sender).
[](){return 3;});
auto error_handling_sender = let_error(
std::move(skipped_transform_sender),
[](exception_ptr e){return just(5);});
auto result = sync_wait(std::move(error_handling_sender));In this example we:
33, but this transform throws an exception rather than returning a transformed valueexception_ptr pointing to the value 25, thus recovering from the error5 to resultAs before, using operator| as in ranges to remove the need to pass arguments around, we can represent this more cleanly:
auto s = ;
int result = sync_wait(
just(3) |
on(scheduler1) |
transform([](float a){throw 2;}) |
transform([](){return 3;}) |
let_error([](auto e){
return just(5);}));Taking inspiration from range adaptors define sender adapters.
Wording to be based on [range.adaptors] with the basic requirement that:
operator| be overloaded for the purpose of creating pipelines over sendersalgorithm(sender, args...)algorithm(args...)(sender)sender | algorithm(args...)algorithms(args...) is a sender adaptor closure objectDetails below are in loosely approximated wording and should be made consistent with [P0443R11] and the standard itself when finalized. We choose this set of algorithms as a basic set to allow a range of realistic, though still limited, compositions to be written against executors.
just creates a sender that propagates a value inline to a submitted receiver.
Signature:
template <typename T>
concept moveable-value = // exposition only
move_constructible<remove_cvref_t<T>> &&
constructible_from<remove_cvref_t<T>, T>;
template <movable-value... Ts>
see-below just(Ts&&... ts) noexcept(see-below);[ Example:
- end example]
The expression execution::just(t...) returns a sender, s wrapping the values t....
execution::connect(s, r) is called resulting in operation_state o containing rCopy with type remove_cvref_t<decltype(r)> and initialized with r and followed by execution::start(o) for some r, will call execution::set_value(r, std::move(t)...), inline with the caller.t throws, then will catch the exception and call execution::set_error(r, e) with the caught exception_ptr.just_on creates a sender that propagates a value to a submitted receiver on the execution context of a passed scheduler. Semantically equivalent to on(just(t), s) if just_on is not customized on s. Providing just_on offers an opportunity to directly customise the algorithm to control allocation of the value t at the head of a custom pipeline.
Signature:
template <execution::scheduler Sch, movable-value... Ts>
see-below just_on(Sch sch, Ts&&... ts) noexcept(see-below);[ Example:
- end example]
The name execution::just_on denotes a customization point object. For some subexpressions sch and ts... let Sch be a type such that decltype((sch)) is Sch and let Ts... be a pack of types such that decltype((ts))... is Ts.... The expression execution::just_on(sch, ts...) is expression-equivalent to:
sch.just_on(ts...) if that expression is valid and if sch satisfies scheduler.just_on(sch, ts...), if that expression is valid, if sch satisfies scheduler with overload resolution performed in a context that includes the declarationand that does not include a declaration of execution::just_on. * Otherwise returns the result of the expression: execution::on(execution::just(ts...), sch)
execution::just_on(sch, ts) returning a sender s shall execute calls to set_value, set_done or set_error on a receiver connected to s on an execution context owned by sch.Blocks the calling thread to wait for the passed sender to complete. Returns T when passed a typed_sender that sends a T on the value channel, where T may be void, throws if an exception is propagated and calls std::terminate on propagation of the set_done() signal.
template <execution::typed_sender S>
auto sync_wait(S&& s) -> std::sender_traits<S>::value_types;
template <class ValueType, execution::sender S>
ValueType sync_wait_r(S&& s);[ Example:
- end example]
The name execution::sync_wait denotes a customization point object. For some subexpression s let S be a type such that decltype((s)) is S. The expression execution::sync_wait(s) is expression-equivalent to:
s.sync_wait() if that expression is valid and if S satisfies sender.sync_wait(s), if that expression is valid, if S satisfies sender, with overload resolution performed in a context that includes the declarationand that does not include a declaration of execution::sync_wait.
Otherwise constructs a receiver, r over an implementation-defined synchronization primitive and passes r to execution::connect(s, r) returning some operation_state os. Waits on the synchronization primitive to block on completion of s.
set_value(r, t) then sync_wait() will return a value, x, of type remove_cvref_t<decltype(t)>.set_value(r) then sync_wait() will return void.set_error(r, e) then sync_wait() calls std::rethrow_exception(e) if decltype(e) is std::exception_ptr or throw e; otherwise.set_done(r) then sync_wait() will call std::terminate.Takes a sender and a scheduler and ensures that the sender operation is connected and started on the execution context associated with the scheduler, giving the programmer control over where the work encapsulated by sender is started.
[ Example:
- end example]
The name execution::on denotes a customization point object. For some subexpressions s and sch, let S be a type such that decltype((s)) is S and Sch be a type such that decltype((sch)) is Sch The expression execution::on(s, sch) is expression-equivalent to:
s.on(sch) if that expression is valid, if S satisfies senderon(s, sch) if that expression is valid, and if S satisfies sender and if Sch satisfies scheduler, with overload resolution performed in a context that includes the declarationand that does not include a declaration of execution::on.
Otherwise:
sender s2 such that when connect is called with some receiver output_receiver as execution::connect(s2, output_receiver) resulting in an operation_state os which is stored as a subobject of the parent operation_state:
r and passes r to execution::connect(s, r) resulting in an operation state ros, which is stored as a subobject of os such that:set_value, set_error or set_done is called on r, the parameter is copied and stored as a subobject of a receiver r2 and execution::connect(execution::schedule(sch), std::move(r2)) results in an operation_state os2 which is stored as a subobject of os such that:
set_value is called on r2, os2’s destructor will be called, the stored value is forwarded to output_receiver on the appropriate choice of set_value, set_error or set_done to match the operation performed on r.set_error or set_done is called on r2 the parameters propagate to output_receiver.connect throws, the resulting exception is forwarded to execution::set_error(output_receiver).ros is called.connect throws, the resulting exception is forwarded to execution::set_error(output_receiver).execution::start(os2).execution::start is called on os, call execution::start(ros).Otherwise, execution::on(s, sch) is ill-formed.
Any customisation of execution::on(s, sch) returning a sender s2 shall execute calls to set_value, set_done or set_error on a receiver connected to s2 on an execution context owned by sch.
when_all combines a set of non-void senders, returning a sender that, on success, completes with the combined values of all incoming senders. To make usage simpler, when_all is restricted to typed_senders that each send only a single possible value type.
Signature:
[ Example:
auto r =
sync_wait(
transform(
when_all(just(3) | just(1.2f)),
[](int a, float b){return a + b;}));
// r==4.2- end example]
The name execution::when_all denotes a customization point object. For some subexpression ss..., let Ss... be a list of types such that decltype((ss))... is Ss.... The expression execution::when_all(ss...) is expression-equivalent to:
when_all(ss...) if that expression is valid, and if each Si in Ss satisfies typed_sender, sender_traits<Si>::value_types<T> for some type T with overload resolution performed in a context that includes the declaration
and that does not include a declaration of execution::when_all.
Otherwise, returns a sender, s, that, when connect(s, output_receiver) is called on the returned sender, for some output_receiver, constructs a receiver ri for each passed sender, si and calls connect(si, ri), returning operation_state object osi. The operation_states, osi, are stored as subobjects within the operation-state object returned from connect(s, output_receiver) such that:
set_value(ti) is called on all ri, for some single value ti for each ri will concatenate the list of values and call set_value(output_receiver, t0..., t1..., tn...).set_done() is called on any ri, will call set_done(output_receiver), discarding other results.set_error(e) is called on any ri will call set_error(output_receiver, e) for some e, discarding other results.When start is called on the returned sender’s operation_state, call execution::start(osi) for each operation_state osi.
Note: See Planned Developments.
transform is a sender adapter that takes a sender and an invocable and returns a sender that propagates the value resulting from calling the invocable on the value passed by the preceding sender.
Signature:
template <execution::sender S, moveable-value F>
requires std::invocable<F, sender_traits<S>::template value_types>
see-below transform(S s, F f);[ Example:
- end example]
The name execution::transform denotes a customization point object. For some subexpressions s and f, let S be a type such that decltype((s)) is S and decltype((f)) is F. The expression execution::transform(s, f) is expression-equivalent to:
s.transform(f) if that expression is valid, s satisfies sender.transform(S, F), if that expression is valid, s satisfies sender with overload resolution performed in a context that includes the declarationand that does not include a declaration of execution::transform.
Otherwise constructs a receiver, r and passes that receiver to execution::connect(s, r) to return an operation_state os such that:
When some output_receiver has been passed to connect on the returned sender to return some operation_state os2:
set_value(r, Ts... ts) is called, calls std::invoke(F, ts...) and passes the result v to execution::set_value(output_receiver, v).f throws, catches the exception and passes it to execution::set_error(output_receiver, e).set_error(r, e) is called, passes e to execution::set_error(output_receiver, e).set_done(r) is called, calls execution::set_done(output_receiver).When start() is called on os2 calls execution::start(os).
execution::transform(s, f) is ill-formed.let_value is a sender adapter that takes a sender and an invocable and returns a sender that keeps the completion result of the incoming sender alive for the duration of the algorithm returned by the invocable and makes that value available to the invocable.
Signature:
template<typename F>
struct is-invocable-with {
template<typename... Args>
using apply = std::bool_constant<(std::invocable<F, Args...> && ...)>;
};
template<execution::sender S, moveable-value F>
requires sender_traits<S>::template value_types<
is-invocable-with<F>::template apply>::value
see-below let_value(S s, F f);where S<T...> and S<T2> are implementation-defined types that is represent senders that send a value of type list T... or T2 respectively in their value channels. Note that in the general case there may be many types T... for a given sender, in which case the invocable may have to represent an overload set.
[ Example:
int r = sync_wait(
just(3) |
let_value([](int& let_v){
return just(4) | transform([&](int v){return let_v + v;})));
// r==7The name execution::let_value denotes a customization point object. For some subexpressions s and f, let S be a type such that decltype((s)) is S and decltype((f)) is F. The expression execution::let_value(s, f) is expression-equivalent to:
s satisfies sender and f satisfies invocable.let_value(s, f), if that expression is valid,, if s satisfies sender and f satisfies invocable with overload resolution performed in a context that includes the declarationand that does not include a declaration of execution::let_value. * Otherwise, returns a sender, s2, that, when connect(s, output_receiver) is called on s2, for some output_receiver, returning an operation_state os2 which will be stored as a subobject of the parent operation_state, constructs a receiver r and passes that receiver to connect(s, r), returning operation_state object os and stores os as a subobject of os2:
set_value(r, ts...) is called:
ts... into os2 as subobjects t2s..., calls std::invoke(f, t2s...) to return some invoke_resultexecution::connect(invoke_result, output_receiver) resulting in some operation_state os3, stores os3 as a subobject of os2 and calls execution::start(os3).os2 must be sequenced after the completion of the operation represented by invoke_result.f or connect() throws, catches the exception and passes it to set_error(output_receiver, e).set_error(r, e) is called, passes e to set_error(output_receiver, e).set_done(r) is called, calls set_done(output_receiver).When start is called on os2, call execution::start(os).
execution::let_value(s, f) is ill-formed.let_error is a sender adapter that takes a sender and an invocable and returns a sender that, on error propagation, keeps the error result of the incoming sender alive for the duration of the sender returned by the invocable and makes that value available to the invocable.
Signature:
template<typename F>
struct is-invocable-with {
template<typename... Args>
using apply = std::bool_constant<(std::invocable<F, Args...> && ...)>;
};
template<execution::sender S, moveable-value F>
requires sender_traits<S>::template error_types<
is-invocable-with<F>::template apply>::value
see-below let_error(S s, F f);[ Example:
float r = sync_wait(
just(3) |
transform([](int v){throw 2.0f;}) |
let_error([](float e){return just(e+1);}));
// r==3.0fThe name execution::let_error denotes a customization point object. For some subexpressions s and f, let S be a type such that decltype((s)) is S and decltype((f)) is F. The expression execution::let_error(s, f) is expression-equivalent to:
s satisfies sender.let_error(s, f), if that expression is valid,, if s satisfies sender with overload resolution performed in a context that includes the declarationand that does not include a declaration of execution::let_error.
Otherwise, returns a sender, s2, that, when connect(s, output_receiver) is called on s2, for some output_receiver, returning an operation_state os2, constructs a receiver r and passes that receiver to connect(s, r), returning operation_state object os and stores os as a subobject of os2:
set_value(r, ts...) is called, passes ts... to set_valus(output_receiver, ts...).set_error(r, e) is called:
e into os2 as e2, calls std::invoke(f, e2) to return some invoke_resultexecution::connect(invoke_result, output_receiver) resulting in some operation_state os3, stores os3 as a subobject of os2 and calls execution::start(os3).os2 must be sequenced after the completion of the operation represented by invoke_result.f or connect() throws, catches the exception as e3 and passes it to set_error(output_receiver, 3).set_done(r) is called, calls set_done(output_receiver).When start is called on os2, call execution::start(os).
Otherwise the expression execution::let_error(s, f) is ill-formed.
ensure_started is a sender adapter that takes a sender, eagerly submits it and returns a sender that propagates the value by reference and can be used as an l-value.
Signature:
[ Example:
auto s1 = just(3) | ensure_started();
auto s2 = s1 | transform([](const int& a){return a+1;}))
int r = sync_wait(
transform(
s2,
[](int b){return b*2;}));
// r==8The name execution::ensure_started denotes a customization point object. For some subexpressions s, let S be a type such that decltype((s)) is S. The expression execution::ensure_started(s, f) is expression-equivalent to:
s.ensure_started() if that expression is valid and if s satisfies sender.ensure_started(s), if that expression is valid, if s satisfies sender with overload resolution performed in a context that includes the declarationand that does not include a declaration of execution::ensure_started.
Otherwise, returns a sender, s2, that, constructs a shared state shr, constructs a receiver, r and passes that receiver to execution::connect(s, r) resulting in an operation_state os that is stored as a subobject of shr.
set_value(r, ts) is called stores ts as subobjects of os.set_error(r, e) is called, stores e as a subobject of os.set_done(r) is called stores the done result as a subobject of os.When some output_receiver has been passed to connect on s2, resulting in an operation_state os2 and one of the above has been called on r:
r was satisfied with a call to set_value, call set_value(output_receiver, ts...)r was satisfied with a call to set_error, call set_error(output_receiver, e).r was satisfied with a call to set_done, call execution::set_done(output_receiver).When start is called on os2, call execution::start(os).
If s2 is destroyed before start is called on os2, calls std::terminate().
Note: See Planned Developments.
Each of these algorithms, apart from just, is customizable on one or more sender implementations. This allows full optimization. For example, in the following simple work chain:
auto s = just(3) | // s1
on(scheduler1) | // s2
transform([](int a){return a+1;}) | // s3
transform([](int a){return a*2;}) | // s4
on(scheduler2) | // s5
let_error([](auto e){return just(3);}); // s6
int r = sync_wait(s);The result of s1 might be a just_sender<int> implemented by the standard library vendor.
on(just_sender<int>, scheduler1) has no customization defined, and this expression returns an scheduler1_on_sender<int> that is a custom type from the author of scheduler1, it will call submit on the result of s1.
s3 calls transform(scheduler1_on_sender<int>, [](int a){return a+1;}) for which the author of scheduler1 may have written a customization. The scheduler1_on_sender has stashed the value somewhere and build some work queue in the background. We do not see submit called at this point, it uses a behind-the-scenes implementation to schedule the work on the work queue. An scheduler1_transform_sender<int> is returned.
s4 implements a very similar customization, and again does not call submit. There need be no synchronization in this chain.
At s5, however, the implementor of scheduler2 does not know about the implementation of scheduler1. At this point it will call submit on the incoming scheduler1_transform_sender, forcing scheduler1’s sender to implement the necessary synchronization to map back from the behind-the-scenes optimal queue to something interoperable with another vendor’s implementation.
let_error at s6 will be generic in terms of submit and not do anything special, this uses the default implementation in terms of submit. sync_wait similarly constructs a condition_variable and a temporary int, submits a receiver to s and waits on the condition_variable, blocking the calling thread.
r is of course the value 8 at this point assuming that neither scheduler triggered an error. If there were to be a scheduling error, then that error would propagate to let_error and r would subsequently have the value 3.
Future changes and discussion points based on R3 of this paper.
Based on experience in Facebook’s codebase, we believe that when_all and ensure_started should return senders that require a scheduler_providers and use forward progress delegation as discussed in [P1898R1].
In the case of when_all, the context the returned sender completes on will depend on which incoming sender completes last. It is thus non-deterministic across that set.
ensure_started is similarly adding non-determinism by removing laziness. If the sender returned by ensure_started is complete by the time a receiver is connected to it, the start call would complete inline with the caller.
In both cases, requiring a scheduler_provider, as discussed in [P1898R1] would offer determinism by guaranteeing a transition onto some downstream scheduler and adding wording to require submission onto that provided scheduler if it does not match the completing context.
We should add a when_all variant that returns tuples and variants in its result, or some similar mechanism for to allow parameter packs, including empty packs in the form of void-senders, and mixed success/error to propagate.
Under error circumstances, when_all should cancel the other incoming work. This will be described separately.
ensure_started similarly needs to be updated to describe how it behaves in the presence of one downstream task being cancelled, and precisely when and where the shared state is destroyed. This would be a preferable solution to termination, as described above, particularly in cases where ensure_started is used as part of a set of operations where something else might throw and cause the sender to be destroyed.
bulk_execute in P0443 with indexed_for as described above.indexed_for as described above should replace bulk_execute during the merge of [P0443R11] into C++23. Suggest fine-tuning this wording and forwarding to LEWG.
The changes this leads to:
bulk_execute and indexed_for and so suggest we not try, instead we rename it.device_vector. This maintains full flexibility - we can add custom data management algorithms independently and keep indexed_for focused on its primary use cause: the asynchronous for loop itself.reference_wrapper semantics, and the cost of injecting shared_ptr would be high. If an implementation needs to copy, then that implementation should implement a wrapper that is custom for the algorthmic structure it is using. For example, a forking tree of threads may allocate once on the first thread by move and reference back to it, knowing the lifetime is safe.Poll: We should add a sender argument and sender result to bulk execution functions (providing an opportunity to build shared state, established dependencies in/out)
SA: 17; F: 7; N: 0; A: 0; SA: 0
Consensus.
Poll: We should replace bulk_execute with indexed_for
SA: 4; F: 11; N: 3; A: 7; SA: 1
No consensus for change. Discussed in the room that indexed_for (and other algorithms by inference) should be build on top of bulk_execute.
The bulk_execute primitive should take an execution policy to constrain the invocable.
SA: 5; F: 7; N: 8; A: 3; SA: 1
R3 of this paper removes indexed_for. If bulk_execute is to remain, there is less urgent need to add indexed_for. Instead R3 focuses on the core set of algorithms. Something like indexed_for, or for_each will be in the async update of the parallel algorithms.
[P0443R11] 2019. A Unified Executors Proposal for C++.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0443r11.html
[P0443R13] 2020. A Unified Executors Proposal for C++.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r13.html
[P1660R0] 2019. A Compromise Executor Design Sketch.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1660r0.pdf
[P1898R1] 2020. Forward progress delegation for executors.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1898r1.html
[P1993R0] 2019. Restore factories to bulk_execute.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1993r0.pdf