| Document #: | P2477R2 | 
| Date: | 2021-11-14 | 
| Project: | Programming Language C++ | 
| Audience: | Evolution Working Group, Library Evolution Working Group | 
| Reply-to: | Chuanqi Xu <chuanqi.xcq@alibaba-inc.com> | 
It is a well-known problem that coroutine needs dynamic allocation to work. Although there is an optimization in the compiler, programmers couldn’t control coroutine elision in a well-defined way still. Also there is not a workable method that a programmer could use to detect whether the coroutine elision happened or not. So I proposed two methods. One for controlling the coroutine elision. And one for detecting it. Both of them wouldn’t break any existing codes.
must_elide from boolean to a tristate enum class.Background and Motivation section.must_elide a non-conexpr function.should_elide to must_elide.coroutine_handle<>::elided() and coroutine_handle<PromiseType>::elided() should be a runtime constant instead of a compile constant.coroutine_handle<>::elided.A coroutine needs space to store information (coroutine status) when it suspends. Generally, a coroutine would use promise_type::operator new (if existing) or ::operator new to allocate coroutine status. However, it is expensive to allocate memory dynamically. Even we could use user-provided allocators, it is not easy (or very hard) to implement a safe and effective allocator. And dynamic allocation wouldn’t be cheap than static allocation after all.
To mitigate the expenses, Richard Smith and Gor Nishanov designed HALO (which is called coroutine elision nowadays) to allocate coroutine on the stack when the compiler finds it is safe to do so. And there is a corresponding optimization called CoroElide in Clang/LLVM. But there is no word in the C++ specification about coroutine elision. It only says:
[dcl.fct.def.coroutine]p9
An implementation **may** need to allocate additional storage for a coroutine.So it leaves the space for the compiler to do optimization. But it doesn’t make any promise for programmers. Then programmers couldn’t find a well-defined method to control nor detect coroutine elision. And coroutine elision doesn’t like other compiler optimization which is tranparent to programmers. Programmers have a strong feeling about dynamic allocation. So I feel it is necessary to provide a mechanism to control and detect coroutine elision in the specification.
Except for the general need to avoid dynamic allocation to speed up. I heard from Niall Douglas that it is needed to disable coroutine elision in an embedded system. In the resource-limited system, we could use promise_type::operator new to allocate space from a big static-allocated bytes array. In this case, it wold be annoying that the programmer found that his coroutine get elided surprisingly. And the programmer has nothing to do to disable it.
This section aims at explaining the terminologies in this proposal to avoid misunderstanding. This aim is not to discuss the words used in the specification formally. That should be discussed in the wording stage. When terminology is introduced, some background more than definition would be provided to help people to get familiar with the context.
The two terminologies refer to the same thing in this proposal. I found that people on the language side prefer to use the term HALO. And the people on the compiler side prefer to use the term Corotuine elision. I like the term Coroutine elision more personally. We could discuss what’s the preferred name if needed.
Generally, these two terms refer to the same thing. But it’s not the case for coroutine. For example, the layout of coroutine status is built at compile time. But the programmer couldn’t see it. Informally, if a value is a constexpr-time constant, it could participate in the template/if-constexpr evaluation. And if a value is a compile-time constant, it needn’t be evaluated at runtime. So simply, a constexpr-time constant is a compile-time constant. But a compile-time constant may not be a constexpr-time constant.
Coroutine status is an abstract concept used in the language standard to refer to all the information the coroutine should keep. And the coroutine frame refers to the data structure generated by the compiler to keep the information needed. So the coroutine status is more abstract and the coroutine frame is more concret. So that we could think coroutine frame is a derived class of coroutine status : ).
A coroutine is a function which is suspendable. Althought we would usually say a coroutine function in practice, a coroutine function should be the same with a coroutine. There is no ambiguity since there is no coroutine class nor coroutine variables. When we say a coroutine, we should mean a corotuine function all the time. A coroutine instance is the alias to coroutine status. The relationship is pretty like class and class Instance. For example:
Here the X is a class and x is a class instance. Further more,
(The definition of Task is omitted here due its lengthy. People who are not familiar with this could visit: Task)
Here CoroutineTask() is a coroutine and it would generate a coroutine instance every time it is called. The key point here is that T is not coroutine instance! From my personal experience, people would treat T as the coroutine instance. It’s not complete right. The instance of Task has a pointer who points to a coroutine instance. So it is possible that the lifetime of the instance of Task would end before the coroutine instance or after it. Both of cases are possible.
The compiler would split a coroutine function into several parts: ramp function, resume function and destroy function. Both the resume function and destroy function contain a compiler-generated state machine to make the coroutine work correctly. And the ramp function is the initial function that would initialize the coroutine frame. Note that in the case of the corresponding promise_type::initial_suspend() would return std::suspend_always all the time, the ramp function would be very small and it would only do allocations and copy parameters to the frame and other things necessary. So that the ramp function is very easy to inline in this case and inlining is the prerequest of coroutine elision. However,it’s possible that the return value of promise_type::initial_suspend() is not std::suspend_always all the time. In that case, the possibility of the ramp function get inlied would derease significantly. So that the possiblity of the corresponding coroutine instace gets elided would decrease significantly, too.
Simply, the semantics of the storage lifetime of a coroutine state is just like a corresponding local variable in the place of the callsite. In another word, the lifetime of the elided coroutine frame would end at the end of the enclosing block. For example:
NormalTask example1(int count) {
  int sum = 0;
  for (int i = 0; i < count; ++i)
    sum += co_await coroutine_func();
  co_return sum;
}Assume coroutine_func() would be elided, then the code would look like:
AlwaysElideTask an_elided_coroutine() { ... }
NormalTask example(int count) {
  int sum = 0;
  for (int i = 0; i < count; ++i) {
    coroutine_status_ty coroutine_status;  // for an_elided_coroutine
    initializing coroutine_status...      
    any other works in an_elided_coroutine // This is possible if the intial_suspend 
                                           // of AlwaysElideTask is not suspend_alwyas.
    // Generally, this would pass the address of coroutine_status to Task.
    AlwaysElideTask Task = coroutine_status.promise.get_return_object();
    sum += co_await Task;
  } // So the coroutine_status is invalided here.
  co_return sum;
}From the example we could found that the lifetime of coroutine_status is limited in the enclosing blocks clearly. And it would be used in one iteration only, so above transformation is valid.
The design consists of 2 parts. One for controlling the elision. One for detecting the elision.
To describe the requirement for eliding a coroutine. We need to introduce an enum class coro_elision to the standard library. The definition of coro_elision should be:
The paper proposed a new component must_elide to promise_type for controlling elision. When a coroutine is begin compliled, the compiler would try to evaluate the result of promise_type::must_elide() (if exist). If its result is coro_elision::always, all the coroutine instance generated from the coroutine would be elided. if its result is coro_elision::never, all the coroutine instance generated from the coroutine would be proved to not elide. If its result is coro_elision::may or its result couldn’t be evaluated at constexpr time, the compiler is free to do elision or not. User shouldn’t assume or depend in the last condition.
In the case the compiler couldn’t find the name must_elide in promise_type or the name must_elide in promise_type doesn’t mean a static constexpr member function, nothing would change.
The necessity of must_elide comes from the fact we observed in practice: “There is no way that the programmer could control coroutine elision in a well-defined way.” Different from other compiler optimization, programmers could feel dynamic allocation strongly. So there is a lack of a mechanism that programmers could control coroutine elision.
There is a tip to make the coroutine easier to elide: Make the coroutine type contain a coroutine_handle only. But here are some drawbacks: - It is a hack. There is no reason to do so from the perspective of the standard. And I think programmers couldn’t understand it except the compiler writers. It is hard to understand. - There is no promise that the compiler made to elide coroutine if some observable conditions are met. No. It means that we tested a coroutine is OK to elide under some conditions in a version of the compiler. But nobody knows whether or not it would elide in other versions. - There are coroutine types that couldn’t contain a coroutine_handle only. But the programmers understand that it is OK. - Even the coroutine types contain a coroutine_handle only. It would be possible that its coroutine instance couldn’t get elided. - There are coroutine types whose initial_suepdn wouldn’t always return std::suspend_always. But the programmers understand that is it OK. - There is the need to disable coroutine elision in certain circumstances. Like we mentioned above in embeded systems.
The core idea behind this is that we need a mechanism in the language level to specify that we want this coroutine to elide and another coroutine to not elide explicitly. And a fact we observed is that there are many coroutine frames that couldn’t be elided but they are safe to elide in fact. I think it is a little bit violates the design principles of C++ for “Pay for what you used”. Now the situation becomes the programmer using coroutine need to pay what they could have avoided.
The key risk here is that: - It is not always safe to elide a coroutine frame.
Here is an example:
Task<int> example2(int task_count) {
  std::vector<ElidedTask<int>> tasks;
  for (int i = 0; i < task_count; i++)
    tasks.emplace_back(coro_task());
    
  // The semantics of whenAll is that it would
  // complete after all the tasks completed.
  co_return co_await whenAll(std::move(tasks));
}If we make coro_task() to elide all the time, the code would look like this:
Task<int> example2(int task_count) {
  std::vector<coroutine_status_ty *> tasks;
  for (int i = 0; i < task_count; i++) {
    coroutine_status_ty coroutine_status;
    tasks.emplace_back(&coroutine_status);
  }
  
  // Now all the pointer in tasks are dangling
  // pointer!
  co_return co_await whenAll(std::move(tasks));
}It should be clear that tasks would contain only dangling pointers after the for-loop. And there would be other examples.
And my answer for these cases is that we shouldn’t make tasks always elide.
The core design ideas include: - Offer a mechanism that the programmer could control coroutine elision. So that they could avoid cost for dynamic allocation or unwanted elision. - It is not designed as a panacea. There are cases it couldn’t handle. And it is possible to be a risk if the user uses it arbitrarily. But after this proposal we could solve some problems we couldn’t solve before. - It is not a forced option. You could avoid using it. So that it wouldn’t break any existing code. It is designed for the programmer who could understand it and willing to pay for the risk. And if you think you don’t understand it or you don’t want to pay that risk, you could avoid to use it simply. - We should support to force a coroutine to elide, to not elide or offloading the judgment to the compiler. Or in another word, the mechanism should return tristate.
People may complain that it is too coarse grained to define must_elide at the promise_type level. It should be better to annotate at the callsite. I agree with it. But the key problem here is that there is no mechanism in C++ to annotate an expression! Although there is attribute in C++ standard. It contains statement level and declaration level only. And it would be a much harder problem to support expression attribute. So it would be a long time to support the expression attribute from my point of view (Or we couldn’t make it actually).
And here are two reasons to define must_elide in promise_type: - It is more suit for the current coroutine design. Now the behavior of a coroutine is defined in promise_type and corresponding awaiter. So it is natural to define must_elide in promise_type. - It is possible to control the coroutine to elide at the callsite level in the current design.
The second point matters more. For example:
struct ShouldElideTagT;
struct NoElideTagT;
struct MayElideTagT;
template<typename T>
concept ElideTag = (std::same_as<T, ShouldElideTagT> ||
                    std::same_as<T, NoElideTagT> ||
                    std::same_as<T, MayElideTagT>);
template <ElideTag Tag>
struct TaskPromiseAlternative : public TaskPromiseBase {
    static constexpr bool must_elide() {
        if constexpr (std::is_same_v<Tag, ShouldElideTagT>)
            return coro_elision::always;
        else if constexpr (std::is_same_v<Tag, NoElideTagT>)
            return coro_elision::never;
        else
            return coro_elision::may;
    }
};
template <ElideTag Tag = MayElideTagT>
using AlternativeTask = Task<TaskPromiseAlternative<Tag>>;
using Task = AlternativeTask<MayElideTagT>;
template <ElideTag Tag = MayElideTagT>
AlternativeTask<Tag> alternative_task () { /* ... */ } // This is a coroutine
Task NaturalTask() { /* ... */ } // This is a coroutine.
int foo() {
  auto t1 = alternative_task<ShouldElideTagT>();
  // Task::elided would call coroutine_handle::elided
  assert (t1.elided());
  auto t2 = alternative_task<NoElideTagT>();
  assert (!t1.elided());
  // The default case, which would be general for most cases
  alternative_task();
  // The author of the function don't want us to control its elision
  // or the author think it doesn't matter to elide or not. After all,
  // the compiler would make the decision.
  NaturalTask();
}In this way, we could control the elision at call site! All cost we need to pay is that we need to type a little more in the definition. And users who don’t want to pay attention could use the default behavior. So it wouldn’t be a big cost for the users.
In the previous version of this proposal, the tristate of promise_type::must_elide() is constexpr time true, constexpr time false and constexpr time unknown. I feel it is natural and clean. But Gašper reminded me that it is problematic in the case of deriving class. Gašper raised an example like:
class AlwaysElidePromiseBase {
public:
     constexpr bool must_elide() { return true; }
};
bool undetermism;
class MayElidePromiseDerived1 : public AlwaysElidePromiseBase {
public:
    constexpr bool must_elide() { return undetermism; }
};
class Task {
public:
    using promise_type = MayElidePromiseDerived1;
};In this example, the base class is declared as always eliding and the derived class wants to rewrite the behavior by returning a constexpr time unkown. However, the code above is not right. Since it wouldn’t be possible to be evaluated at constexpr time, which violates the constexpr function rule [dcl.constexpr]p6. So in the original design, it is impossible to make the derived promise base to declare the coroutine should be may-elide. So I choose to make the must_elide function to return a explicit tristate.
Add a non-static member bool function elided to std::coroutine_handle<> and std::coroutine_handle<PromiseType> .
namespace std {
  template<>
  struct coroutine_handle<void>
  { 
    // ...
    // [coroutine.handle.observers], observers
    constexpr explicit operator bool() const noexcept;
    bool done() const;
+   bool elided() const noexcept;
    // ...
  };
  template<class Promise>
  struct coroutine_handle
  {
    // ...
    // [coroutine.handle.observers], observers
    constexpr explicit operator bool() const noexcept;
    bool done() const;
+   bool elided() const noexcept;
        // ...
  };
}And the semantics should be clear. If the corresponding coroutine is elided, then elided would return true. And if the corresponding coroutine is not elided, then elided should return false.
Here are two reasons I want to introduce elided:
elided is provided.But introducing elided would introduce overhead. Since it would require the coroutine status to store extra information.
And here is 3 direction:
coroutine_handle<>::elided() and coroutine_handle<PromiseType>::elided().coroutine_handle<PromiseType>::elided() only.elided().If we want to support both coroutine_handle<>::elided() and coroutine_handle<PromiseType>::elided(), we must refactor the ABI for coroutine into the following form:
struct CoroutineFrame {
  void (*resume_func)(CoroutineFrame*);
  void (*destroy_func)(CoroutineFrame*);
  bool Elided;
  promise_type;
  // any other information needed
};[Note: The current coroutine ABI is:
struct CoroutineFrame {
  void (*resume_func)(CoroutineFrame*);
  void (*destroy_func)(CoroutineFrame*);
  promise_type;
  // any other information needed
};]
There are several drawbacks:
For the ABI problem, I am not sure if it matters so much. There are some reasons:
And for the second and the third drawbacks, they are the price we must pay to enable this feature.
And the implementation demo implements this strategy.
If we decided to not support coroutine_handle<>::elided(), we could move the Elided bit in the back of promise_type:
struct CoroutineFrame {
  void (*resume_func)(CoroutineFrame*),
  void (*destroy_func)(CoroutineFrame*),
  promise_type,
  bool Elided,
  // any other information needed
};So that we could avoid the ABI problems. But I feel odd for giving up to support coroutine_handle<>::elided. Since coroutine_handle<> is a generic abstract for coroutine and elided should be a generic feature for all the coroutine.
This is simple. We wouldn’t need to pay anything in implementation and language wording.
I prefer to support both and pay for the price for more space needed and the extra store instruction. But I am not so sure since the use cases for elided I got now are in the test only. And for the codes which are actually running in production, I couldn’t imagine the use cases for elided. So in another word, the people who use coroutine in production would pay for the price that they don’t use. This violates the principle of C++ that “Pay for what you use”. But as I said, it is really odd that people could control coroutine elision without observing it.
So I am not sure about this. Any opinion is welcome.
A set of examples could be found at Examples. Example-1~6 mimic the use of loop, select operator, goto statement, function default argument, the initializer of an if-block, and the initializer of a class data-member. All of them are simple and easy to understand. Here talks about the basic usage for must_elide only.
(Note: This contains the example in R1 only. The only difference may be that the return type of promise_type::must_elide would be constexpr bool still.)
We could enable/disable coroutine elision for a kind of coroutine like:
struct TaskPromiseAlwaysElide : public TaskPromiseBase {
    static constexpr bool must_elide() {
        return true;
    }
};
struct TaskPromiseNeverElide : public TaskPromiseBase {
    static constexpr bool must_elide() {
        return false;
    }
};Then for the coroutine like:
template<class PromiseType>
class Task : public TaskBase{
public:
    using promise_type = PromiseType;
    using HandleType = std::experimental::coroutine_handle<promise_type>;
    // ...
}; // end of Task
using AlwaysElideTask = Task<TaskPromiseAlwaysElide>;
using NeverElideTask = Task<TaskPromiseNeverElide>;
AlwaysElideTask always_elide_task () {
    // ... contain coroutine keywords
}
NeverElideTask never_elide_task () {
    // ... contain coroutine keywords
}Then every coroutine frame generated from always_elide_task() would be elided. And every coroutine frame generated from never_elide_task wouldn’t be elided.
If the compiler couldn’t infer the result of promise_type::must_elide at compile time, then the compiler is free to do elision or not. And the behavior would be the same with must_elide is not exist in promise_type.
bool undertermism;
struct TaskPromiseMeaningless : public TaskPromiseBase {
    static constexpr bool must_elide() {
        return undertermism;
    }
};Then TaskPromiseMeaningless would be the same as TaskPromiseBase for every case.
It is possible that we could control coroutine elision for specific call. Here is my demo,
using NormalTask = Task<TaskPromiseBase>;
using AlwaysElideTask = Task<TaskPromiseAlwaysElide>;
using NeverElideTask = Task<TaskPromiseNeverElide>;
struct ShouldElideTagT;
struct NoElideTagT;
struct MayElideTagT;
template<typename T>
concept ElideTag = (std::same_as<T, ShouldElideTagT> ||
                    std::same_as<T, NoElideTagT> ||
                    std::same_as<T, MayElideTagT>);
bool undetermism;
template <ElideTag Tag>
struct TaskPromiseAlternative : public TaskPromiseBase {
    static constexpr bool must_elide() {
        if constexpr (std::is_same_v<Tag, ShouldElideTagT>)
            return true;
        else if constexpr (std::is_same_v<Tag, NoElideTagT>)
            return false;
        else
            return undetermism;
    }
};
template <ElideTag Tag = MayElideTagT>
using AlternativeTask = Task<TaskPromiseAlternative<Tag>>;
template <ElideTag Tag = MayElideTagT>
AlternativeTask<Tag> alternative_task () { /* ... */ } // This is a coroutine
int foo() {
  auto t1 = alternative_task<ShouldElideTagT>();
  // Task::elided would call coroutine_handle::elided
  assert (t1.elided());
  auto t2 = alternative_task<NoElideTagT>();
  assert (!t1.elided());
  // The default case, which would be general for most cases
  alternative_task(); 
} From the example in foo, we could find that we get the ability to control elision at callsite finally! Although it looks tedious to add template <ElideTag Tag> alone the way. But I believe the cases that we need to control wouldn’t be too much. We only need to control some of the coroutines in a project.
This section describes the limitations of this proposal. It contains 2 situations that we couldn’t elide a coroutine even if the promise_type::must_elide evaluates to true at compile time and a situation that is beyond the current elision design.
An indirect call is the call which the compiler couldn’t find the corresponding callee at compile time. The most common indirect call is virtual functions. For example:
Class A {
public:
virtual CoroType CoroFunc() { /*...*/ }
};
class B : public A {
public:
CoroType CoroFunc() { /*...*/ }
};
CoroType foo(A* a) {
    co_await a->CoroFunc(); // The compiler couldn't know which function CoroFunc refers to.
};Another example is function pointers:
// coro_func is a global variable
std::function<Task()> coro_func;
// ... many operations to fulfill coro_func
// in another function
co_await coro_func(); Although we could see from the signature the function that coro_func refers to should elide all the way. But the compiler couldn’t elide it actually since it can’t know the coroutine coro_func refers to usually. The compiler couldn’t know if coro_func refers to a coroutine actually. And the compiler couldn’t do too much even if we could use any hack possible.
The internal reason is that the compiler needs to know how many bits we need to allocate the coroutine status if we want to elide a coroutine. And the compiler couldn’t infer that if it couldn’t see the body of the coroutine.
Another issue is that we couldn’t elide a coroutine into a function in another translation unit. For example:
// CoroType.h
AlwaysElideTask may_be_a_coroutine();      // We couldn't see if this is a coroutine
                                           // from the signature.
// A.cpp
AlwaysElideTask may_be_a_coroutine() {     // It is a coroutine indeed.
  co_await std::suspend_always{};
  // ... Other works needed.
}
// B.cpp
AlwaysElideTask CoroA() {
  co_return co_await may_be_a_coroutine(); // Is it a coroutine?
}The reason why the compiler couldn’t elide it is similar to the above question. If the compiler couldn’t know the callee, it couldn’t do optimization.
From the perspective of a compiler, this issue is potentially solvable by enabling LTO (Link Time Optimization). In LTO, the compiler could see every translation unit. But since we are talking about the language standard. I feel like that is not possible that we could add the concept LTO in the standard nor we could change the compilation model.
There is an example above to address the problem:
Task<int> example2(int task_count) {
  std::vector<ElidedTask<int>> tasks;
  for (int i = 0; i < task_count; i++)
    tasks.emplace_back(coro_task());
    
  // the semantics of whenAll is that it would
  // complete after all the tasks completed.
  co_return co_await whenAll(std::move(tasks));
}We shouldn’t make coro_task() to elide since it would be problematic when we use tasks. The reason behind this is that the storage lifetime of elided coroutine would end at the enclosing braces, but it is possible to use after the lifetime ends!
This is unsolvable in the current design. Since the compiler needs to know the stack frame size for each function statically at compile time. But the compiler couldn’t know the value of task_count at compile time, so the compiler won’t know the space needed for the function.
The two limitations here exist since the compiler needs to know the callee at compile time. And another limitation exists since the compiler need to know the frame size statically. I think all three are unable to solve.
For the last limitation, I think it is OK to leave it in the future to resolve (if possible). And for the other limitations, I think there are two methods: - Make the program ill-formed once the compiler found the situations. - Define that the must_elide doesn’t work for virtual functions and functions in another translation unit explicitly.
I think the latter is better. Although it is constrained, the semantics are well-defined.
An implementation could be found here: CoroMustElide.
An issue in the current implementation is that: it could work only if we turned optimization on. In other words, if we define this in the specification, it should work correctly under O0.
The reason is the internal pass ordering problem in the compiler. I definitely believe it could be solved if we get in consensus on this proposal.
Mathias Stearn points out that there is no coroutine elision optimization in GCC now. So it would be harder to implement this proposal in GCC since it would require GCC to implement coroutine elision first.
This section includes the questions had been asked and the corresponding answer/discussion.
Here talks about the case of clang only. Since the question was raised because clang handles coroutine in two parts, the frontend part and the middle end part. But I think this wouldn’t be a problem for GCC since GCC handles coroutine in the frontend.
Simply, the frontend would pass specified information to the middle end after parsing source codes. So when the frontend sees the value of promise_type::must_elide evaluates to true or false, the frontend would pass a special marker to the middle end to inform that the coroutine should be elided or not.
Due to that coroutine elision depends on inlining now. So in the case, the middle end sees an always-elide marker, it would mark the ramp function as always_inline to ensure it could be inlined as expected.
No, although it looks like a constexpr function. Even its result isn’t a compile time constant. This doesn’t make much sense. Since we could know that the elision is done at compile time. So it is natural that whether or not a coroutine is elided at runtime. But the point here is that we couldn’t associate a coroutine_hadle with a coroutine at compile time. For example:
The compiler couldn’t connect the handle with a coroutine clearly. So it must read information in the memory somewhere. So the result of elided() should be a runtime constant. (The runtime constant seems to stand for little in C++.)
Here is an example:
template<typename... Args>
ElidedTask<void> foo_impl(Args... args) { co_await whatever(); ... }
template<typename... Args>
ElidedTask<void> foo(std::tuple<Args> t) {
  co_await std::apply([](Args&... args) { return foo_impl(args...); }, t);
}Due to the guaranteed copy-elision, the return value of foo_impl should be constructed directly at the lambda in std::apply. Then the question is: how about the coroutine frame? Where does its lifetime end? Is the code above good? Does it violate the rule guaranteed copy-elision?
First, for the current language specification, coroutine elision should be irrelevant to copy-elision. Since coroutine elision talks about whether or not to allocate a coroutine status. And copy-elision talks about where we should construct an object.
The key point here is that an object would never be a coroutine status and coroutine status would never be an object. The object could have pointers that points to the coroutine status. And the coroutine status is a special instance created by the compiler in the middle end. So copy elision wouldn’t affect coroutine elision. Now let’s back to the above example and see what would happen.
When foo_impl() got to be elided into the lambda function of foo(), the coroutine status of foo_impl() would become a local variable in the lambda. So when the lambda exits, the coroutine status would be released. And when foo() try to resume the coroutine returned by std::apply, it would access the coroutine status, which is a dangling pointer now. So it would crash generally (Or an undefined bahavior).
People may argue that this is not same with the intuition. But I think the overall behavior is well-defined except what would happen finally is an undefined behavior. BTW, the behavior wouldn’t change if foo_impl() would be inlined into the lambda.
So the above example is not good. We couldn’t mark the return type for foo_impl() as ElidedTask<void>.
must_elide() must be a static constexpr member function?First, must_elide() must be a static member function since it determinates how coroutine status would be allocated. Note that the lifetime of the promise is as long as the coroutine status. So we coulnd’t write coro_promise.must_elide() since the promise should not be constructed yet!
Second, must_elide could be a non-constexpr function in fact. How could it be? For example, if must_elide could be a non-constexpr function, the compiler should generate the following code for a coroutine:
coroutine_status_ty coroutine_status;
coroutine_handle<promise_type> handle_;
if (promise_type::must_elide())
    handle_ = &coroutine_status;
else
    handle_ = promise_type::operator new(size_of(coroutine_status_ty));
// use of handle_Note that in this case we could fill the arguments of the coroutine into promise_type::must_elide() if we want.
So why do I didn’t propose this? Here are my reasons: - Cost. - Unimaged usage.
First, for the cost problem, if must_elide exists and the compiler couldn’t evaluate it at compile time, there would be an extra stack varaible and condition statement for every coroutine instance. Honestly, the cost is not every expensive to me. Especially, the implementation of must_elide() shouldn’t be too complex so the compiler may be possible to eliminate the redundant.
The key reason I didn’t propose this is that I couldn’t find the usage which couldn’t be handled by the constexpr one. This form looks a little bit cooler. But I don’t figure out what users could implement in this form that couldn’t be implemented in constexpr one. So I give up to this one.
This proposal proposes two components to coroutine. One called must_elide allows the user to control the coroutine elision which is impossible before. One called elided allows the user to observe the coroutine elision. For must_elide, it gives the rights to user but also the risks. But it is an alternative, user who don’t want to pay the risk could avoid to use it and nothing would change. But there are still cases (indirect call, cross TU and out-of-block use) that couldn’t be handled by must_elide , we could only define the condition clearly in the standard. But after all, the situation would be much better. Now people could even control the elision at the callsite. For elided, it would require another bit in coroutine frame and an additional store instruction at runtime. And the use cases we found now are in tests only. We need more discussion to make a decision.
Many thanks for Lewis Baker, Mathias Stearn and Gašper Ažman reviewing this paper.