This document proposes a minimal and complete C++ API for starting, stopping and querying the status of threads, with the following synopsis:
    namespace thread {
		
    class resource_error: public std::exception;
    class handle; // a handle to a thread
    template<class F> handle create( F f ); // create a thread executing f()
    template<class Runnable> handle create( shared_ptr<Runnable> p ); // create a thread executing p->run()
    handle current(); // create a handle to the current thread
    
    void join( handle th ); // wait for the thread identified by th to end
    bool try_join( handle th ); // query whether the thread identified by th has ended
    bool timed_join( handle th, timespec const & abstime ); // wait with timeout
    void cancel( handle th ); // attempt to cancel the thread identified by th
    enum cancel_state
    {
        cancel_state_disabled, // = PTHREAD_CANCEL_DISABLE
        cancel_state_enabled   // = PTHREAD_CANCEL_ENABLE
    };
    cancel_state set_cancel_state( cancel_state cs ); // set the cancel state of the current thread
    void test_cancel(); // explicit cancelation point
    } // namespace thread
		The central component of the proposed design is the class thread::handle. It is DefaultConstructible (with a singular value that identifies no thread), CopyConstructible, Assignable, EqualityComparable, LessThanComparable, Hashable and OutputStreamable (for diagnostic purposes). A handle uniquely identifies its thread. All copies of a given handle are equivalent, and all handles to the same thread are equivalent. In particular, the handle returned by thread::create and the handle returned by thread::current from within the newly created thread are equivalent and fully interchangeable. A handle provides basic thread safety.
A new thread is created by a call to thread::create. Its argument can be an arbitrary nullary function object f that is called in the new thread. If f() throws an exception other than the implementation-defined cancelation exception, this results in std::terminate being called; in other words, thread::create does not place a catch clause around the call. The return value of f(), if any, is ignored.
A convenience shorthand is provided in the form of an overload of thread::create that accepts a shared_ptr to an arbitrary class with a run() member function; the behavior of this overload is as if bind( &Runnable::run, p ) has been passed to the first form of thread::create. The overload is provided based on experience with boost::thread; programmers, especially those with Java or other C++ threading library background, often want to create a thread from an object, and see no apparent way to accomplish the task. The shared_ptr argument ensures that the object will be kept alive for the duration of the thread, and offers the client a way to keep another shared_ptr to the object and communicate with it, if necessary.
A thread can obtain a handle to itself by calling thread::current. If the thread calling thread::current has not been created by thread::create, it is implementation defined whether the function will succeed; in practice, the majority of platforms will not fail the call.
It is possible to wait for a thread to end, given its handle, by calling thread::join. thread::join is idempotent and sequentially consistent ("strong thread safety"); that is, it can be called multiple times from one or more threads, even in parallel. On every occasion, the behavior of thread::join is simply to block until the thread identified by its argument has ended. If the thread has already ended, the function returns immediately. thread::join is a cancelation point.
A nonblocking variant of thread::join is provided, thread::try_join. It returns true when the thread has ended, false otherwise. It is not a cancelation point.
thread::timed_join is a variant of thread::join that accepts a timeout. It waits for a bounded amount of time for the completion of the thread identified by th. Its return value is consistent with thread::try_join. thread::timed_join is blocking and hence, a cancelation point.
thread::cancel delivers a cancelation request to the thread identified by its argument. When a thread with a cancel state set to cancel_state_enabled has a cancelation request pending and encounters a cancelation point, it throws an implementation-defined exception, called a cancelation exception. For cancelation to be useful, at minimum the blocking wait on a C++ condition variable must be a cancelation point.
On POSIX platforms, the only practical way for C++ cancelation to be implemented is to use the underlying POSIX cancelation mechanism. For this to have a chance to work, POSIX cancelation needs to be implemented as (the equivalent of) throwing a C++ exception. On platforms where this is not the case, this author's opinion is that we (the C++ committee) can effectively do nothing to fix cancelation from the C++ side, and regrettably, programmers on these platforms will not be able to take advantage of the feature. Our best bet is simply to provide a mechanism to invoke pthread_cancel and leave it at that.
On Windows, the OS provides no built-in cancelation support, so cancelation will be implemented by the C++ API. The thread layer provides a Windows event handle that can be used by cancelation points in a WaitForMultipleObjects call to watch for cancelation requests.
The thread API provides an explicit cancelation point thread::test_cancel. A thread that spins in a tight loop containing no blocking calls can periodically invoke thread::test_cancel if it wishes to handle cancelation requests.
Finally, thread::set_cancel_state provides a mechanism for a thread to ignore cancelation requests for a period of time, in order to provide the nothrow guarantee. thread::set_cancel_state returns the old value of the cancel state so that it can be restored at the end of the nothrow region. The initial cancel state of a thread is cancel_state_enabled.
The proposed design differs from the widely known boost::thread, which will be used in this section as a reference point, by hiding the thread state from the user and making it the responsibility of the implementation to manage its lifetime and to be able to produce references (handles) to it on demand.
Experience with boost::thread has shown that users who need to refer to the thread state from two different points in the code are forced to use shared_ptr<boost::thread> or boost::thread* to reimplement a handle-based layer on top of it. Unfortunately, this doesn't provide them with the convenience of being able to retrieve a reference to the thread state of the current thread without somehow receiving one as an argument. In addition, using a raw pointer is inherently prone to dangling pointer errors and memory leaks, and when the current thread is a "foreign" thread, it may have no thread state at all.
In this author's opinion, users should not be forced to reimplement a (more limited and less useful) handle-based API; we, the library implementers, need to provide one for them.
The proposed handle class has full reference semantics and is usable in standard containers out of the box. It is copyable, with all copies being equivalent, interchangeable and identifying the same thread. thread::handle does not represent an unique access point to the thread state, as a noncopyable but movable class would. Since a thread can create a handle to itself at any time by calling thread::current, it naturally follows that uniqueness cannot be guaranteed.
The thread layer does not transport return values or exceptions from one thread to another. It has been shown (by this author and others) that this functionality can be implemented in a separate, general purpose component that has no dependence on the specific API used for issuing an asynchronous function call, in a thread or otherwise (one alternative is a remote procedure call; another is just doing a synchronous execution in the same thread.)
A companion paper, Transporting Values and Exceptions between Threads, presents one possible design for such a component, called a future.
thread::handle itself provides basic thread safety and is not atomic; it is as thread safe as a raw pointer or a shared_ptr, the recommended default level of thread safety for all C++ components.
Operations on a thread identified by (copies of) the same handle, however, provide strong thread safety or sequential consistency. That is, multiple concurrent calls to the threading API are allowed and well defined, even when they refer to the same thread; their behavior is as if they have been issued sequentially in an unspecified order. This is the thread safety level that is intuitively expected by the majority of programmers from a threading API.
boost::thread::join has semantics similar to pthread_join, in that it is allowed to be called at most once. This has been a source of complaints from the users, who perceive it as an arbitrary and unnecessary requirement. This author agrees with the users and as a consequence, thread::join and its cousins can be called multiple times, even in parallel.
A further refinement of this proposal can include extensions to thread::create that supply attributes for the new thread, a non-portable accessor that retrieves the OS thread handle, and a mechanism to obtain a scalar and atomic thread identifier (useful for recording the current owner of a synchronization primitive, among other things). The extensions are omitted for brevity; it is more important to agree on the general direction first.
Two prototype implementations of the proposed API are available, one Windows-based:
http://www.pdimov.com/cpp/thread2_w32.cpp
and one POSIX-based:
http://www.pdimov.com/cpp/thread2_pt.cpp
The prototypes use intrusive reference counting to manage the lifetime of the thread state, represented by the thread::state class. thread::handle is defined as a typedef for boost::intrusive_ptr<thread::state>. Both prototypes are able to "adopt" a foreign thread from within thread::current and produce a fully-featured handle for it (although the Windows implementation will not be able to do so on some versions on Windows CE due to lack of DuplicateHandle.)
The Windows implementation provides a non-portable accessor thread::get_cancel_event that can be used from cancelation points such as condition::wait to retrieve the cancelation event handle for the current thread.
The POSIX implementation of thread::state contains the following members, given as a measure of the weight of the proposed API:
    long refs_;
    pthread_mutex_t mx_;
    pthread_cond_t cn_;
    bool ended_;
    pthread_t handle_;
		On platforms where pthread_t has a singular value, the ended_ boolean flag can be eliminated.
Since a thread is a relatively heavy object - a typical default stack size for a new thread is 1 MB - this author believes that the overhead imposed by the proposed API is justified by its functionality and usability.
The Windows implementation does not need to contain a mutex and a condition variable because the semantics of thread::join can be implemented directly on top of the native API:
    long refs_;
    long cancel_state_;
    HANDLE cancel_event_;
    HANDLE handle_;
		although, of course, it contains the cancelation support that is hidden behind the pthread_t in the POSIX case.
A final note on the Windows implementation: the prototype does not implement the infrastructure that is necessary for thread-specific data variables to have destructors (a functionality offered by pthread_key_create, but having no equivalent on Windows). The Boost implementation of boost::thread_specific_ptr does contain such infrastructure with two alternate implementations, one based on DLL process/thread attach/detach notifications, the other based on the Portable Executable (PE) format ability to execute a function on thread termination. The relevant files can be viewed at:
http://boost.cvs.sourceforge.net/*checkout*/boost/boost/libs/thread/src/tss_hooks.cpp
http://boost.cvs.sourceforge.net/*checkout*/boost/boost/libs/thread/src/tss_pe.cpp
http://boost.cvs.sourceforge.net/*checkout*/boost/boost/libs/thread/src/tss_dll.cpp
Such TSD destructor support has not been included in the Windows prototype in order to keep it manageable and understandable. A production-quality implementation, of course, will have to provide such support as it cannot afford to leak the thread states. The POSIX implementation takes advantage of the POSIX built-in TSD destructor support and contains no missing pieces.
It is my sincere hope that C++0x will provide a mechanism for us to just say
__thread X s_tx;
and have the destructor of X automatically executed on thread termination (and the constructor of X executed on thread creation) so that we can dispense with the clumsy workarounds.
A future revision of this document will provide a proposed text for addition to the working paper or a technical report, if the proposal gathers sufficient interest from the working groups.
--end