1. Motivation
Performance sensitive code is impacted by the cost of initializing and
manipulating strings. When writing data into a 
- 
     Pay for extra initialization — resize 
- 
     Pay for extra copies — Populate a temporary buffer, copy it to the string. 
- 
     Pay for extra "bookkeeping" — reserve 
The effects of these unhappy choices can all be measured at scale, yet C++'s hallmark is to leave no room between the language (or in this case the library) and another language.
LEWGI polled on this at the [SAN] Meeting:
We should promise more committee time to [P1072R1], knowing that our time is scarce and this will leave less time for other work?
Unanimous consent
LEWG at the [SAN] Meeting:
We want to solve this problem
SF F N A SA 17 5 0 0 0 Willing to solve this for string without solving it for vector
SF F N A SA 6 9 2 1 0 
2. Proposal
This proposal addresses the problem by adding 
template<class Operation> constexpr void resize_and_overwrite(size_type n, Operation op);
Effects: Alters the value of
as follows:* this 
- — If
, erases the lastn <= size () elements.size () - n - — If
, appendsn > size () default-initialized elements.n - size () - — Invokes
.erase ( begin () + op ( data (), n ), end ()) 
Remarks:
If
throws the behavior is undefined.op Let
before the call too = size () .resize_and_overwrite Let
.m = op ( data (), n ) 
otherwise the behavior is undefined.m <= n If
,m > o shall replace ([expr.ass]) the values stored in the character arrayop . Until replaced, the values may be indeterminate [basic.indet] [Note:[ data () + o , data () + m ) may not be* ( data () + o ) . - end note]charT () 
may write toop . Any value written will be replaced withdata () + n + 1 aftercharT () returns. [Note: This facilitiates interoperation with functions that write a trailing null. - end note ]op When
is calledop is in an unspecified state.* this shall not bind or in any way accessop .* this 
shall not allow its first argument to be accessible after it returns.op 
In order to enable 
3. Implementation
4. Examples
4.1. Stamping a Pattern
Consider writing a pattern several times into a string:
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { std :: string ret ; ret . reserve ( pattern . size () * count ); for ( size_t i = 0 ; i < count ; i ++ ) { // SUB-OPTIMAL: // * Writes 'count' nulls // * Updates size and checks for potential resize 'count' times ret . append ( pattern ); } return ret ; } 
Alternatively, we could adjust the output string’s size to its final size,
avoiding the bookkeeping in 
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { std :: string ret ; const auto step = pattern . size (); // SUB-OPTIMAL: We memset step*count bytes only to overwrite them. ret . resize ( step * count ); for ( size_t i = 0 ; i < count ; i ++ ) { // GOOD: No bookkeeping memcpy ( ret . data () + i * step , pattern . data (), step ); } return ret ; } 
With this proposal:
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { std :: string ret ; const auto step = pattern . size (); // GOOD: No initialization ret . resize_and_overwrite ( step * count , [ & ]( char * buf , size_t n ) { for ( size_t i = 0 ; i < count ; i ++ ) { // GOOD: No bookkeeping memcpy ( buf + i * step , pattern . data (), step ); } return step * count ; }); return ret ; } 
4.2. Interacting with C
Consider wrapping a C API while working in terms of C++'s 
extern "C" { int compress ( void * out , size_t * out_size , const void * in , size_t in_size ); size_t compressBound ( size_t bound ); } std :: string CompressWrapper ( std :: string_view input ) { std :: string compressed ; // Compute upper limit of compressed input. size_t size = compressBound ( input . size ()); // SUB-OPTIMAL: Extra initialization compressed . resize ( size ); int ret = compress ( &* compressed . begin (), & size , input . data (), input . size ()); if ( ret != OK ) { throw ... some error ... } // Set actual size. compressed . erase ( size ); return compressed ; } 
With this proposal:
extern "C" { int compress ( void * out , size_t * out_size , const void * in , size_t in_size ); size_t compressBound ( size_t bound ); } std :: string CompressWrapper ( std :: string_view input ) { std :: string compressed ; // Compute upper limit of compressed input. size_t bound = compressBound ( input . size ()); int ret ; // GOOD: No initialization compressed . resize_and_overwrite ( bound , [ & ]( char * buf , size_t n ) { size_t compressed_size = n ; ret = compress ( buf , & out_size , input . data (), input . size ()); return compressed_size ; }); if ( ret != OK ) { throw ... some error ... } return compressed ; } 
5. Design Considerations
5.1. Method vs. Alternatives
During the [SAN] Meeting LEWG expressed a preference for
implementing this functionality as a new method on 
Method on string vs storage_buffer type:
Strong Method Method Neutral Type Strong Type 9 2 3 2 2 
During the [KOA] Meeting, LEWG expressed a preference for not weakening 
- 
     A default-initialized buffer is only accessible to op 
- 
     Under penalty of undefined behavior, op [ data () + m , data () + n ) 
- 
     This proposal follows the "lambda" option discussed in [SAN]. 
Several other alternatives were discussed at [SAN], [KOA], and on the reflector. Please see § 6 Alternatives Considered below for more details.
5.2. Allocator Interaction
Unlocking the desired optimizations requires some change to 
This restriction should be acceptable because 
This Clause describes components for manipulating sequences of any non-array trivial standard-layout (6.7) type. Such types are called char-like types, and objects of char-like types are called char-like objects or simply characters.
Removing calls to false,
which should be the case in practice.
Along the way, this proposal clarifies ambiguous language in [string.require] and [container.requirements.general] by:
- 
     Stating explicitly that basic_string 
- 
     Introducing the term "allocator-aware" earlier in [container.requirements.general]. 
- 
     Not attempting to mention other "components" in [container.requirements.general]. The other allocator-aware containers (just basic_string regex :: match_results 
libstdc++ and msvc allow strings of non-trivial type. That might force
those libraries to continue to support 
5.3. Undefined Behavior
- 
     Uninitialized buffers are not uncommon in C++. See for example new T [ 20 ] make_unique_for_overwrite 
- 
     Dynamic analyzers like Memory Sanitizer [MSan] and Valgrind [Valgrind] can catch uninitialized reads. 
The version of this proposal seen by LEWG in [Cologne] throws 
The authors further recommend UB if 
5.4. Bikeshedding
What do we want to call this method?
- 
     resize_and_overwrite make_unique_for_overwrite make_unique op 
- 
     resize_default_init make_unique_default_init make_unique_for_overwrite 
- 
     resize_uninitialized uninitialized uninitialized_copy 
6. Alternatives Considered
6.1. Tag Type
At the [SAN] Meeting, LEWG showed support for a tag argument type:
Approval (vote for as many as you find acceptable)
13 Go back to resize_uninitialized 15 Do tag type (default_initialize) for c’tor / resize() 12 Continue with storage_buffer (as per R2 of this paper) 7 Crack open with a lambda 7 RAII separate type 
For example:
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { const auto step = pattern . size (); // GOOD: No initialization std :: string ret ( step * count , std :: string :: default_init ); for ( size_t i = 0 ; i < count ; i ++ ) { // GOOD: No bookkeeping memcpy ( ret . data () + i * step , pattern . data (), step ); } return ret ; } 
Benefits:
- 
     A constructor that takes a tag type simplifies some use cases, like the example above. 
- 
     Matches existing practice in Boost. See § 7.7 Boost. 
Drawbacks:
- 
     Feature creep / complexity — A tag type invites generalizing for all allocator-aware container types. We agree that this is desirable and even has implementation experience in Boost. However default initialization is not enough. There is also "implicit construction" (see [P1010R1], [P0593R2]) and "relocation" (see [P1144R0], [P1029R1]). Neither of these features are yet in the language. It is too early to generalize. Note that the second poll quoted in § 1 Motivation shows support for solving this problem for [ basic_ ] string vector 
- 
     In reflector discussion of make_unique_default_init copy_backward copy count_if count 
Conclusion:
LEWG should explore tags for allocator-aware containers, but that work should not block near-term enablement of efficient [std::]string builders.
6.2. Non-Lambda Approach
In [P1072R0] and [P1072R3] of this proposal, the authors considered a
method 
For illustration:
extern "C" { int compress ( void * out , size_t * out_size , const void * in , size_t in_size ); size_t compressBound ( size_t bound ); } std :: string CompressWrapper ( std :: string_view input ) { std :: string compressed ; // Compute upper limit of compressed input. size_t size = compressBound ( input . size ()); // GOOD: No initialization compressed . resize_default_init ( size ); int ret = compress ( &* compressed . begin (), & size , input . data (), input . size ()); if ( ret != OK ) { throw ... some error ... } // Suppose size is the value of size before the call to compress and size' // is the value of size after the call to compress. // // If size' < size, then: // std::cout << compressed[size' + 1] // ...triggers a read of uninitialized data. // Set actual size. compressed . erase ( size ); return compressed ; } 
6.3. Standalone Type: storage_buffer 
   In [P1072R1], we considered 
At the [SAN] Meeting, this approach received support from LEWGI in light of the [post-Rapperswil] email review indicating support for a distinct type. This approach was rejected by the larger LEWG room in San Diego Meeting Diego.
The proposed type would be move-only.
std :: string GeneratePattern ( const std :: string & pattern , size_t count ) { std :: storage_buffer < char > tmp ; const auto step = pattern . size (); // GOOD: No initialization tmp . prepare ( step * count + 1 ); for ( size_t i = 0 ; i < count ; i ++ ) { // GOOD: No bookkeeping memcpy ( tmp . data () + i * step , pattern . data (), step ); } tmp . commit ( step * count ); return std :: string ( std :: move ( tmp )); } 
For purposes of the container API, 
std :: storage_buffer < char > buf ; buf . prepare ( 100 ); * fill in data * buf . commit ( 50 ); std :: string a ( buf . begin (), buf . end ()); std :: string b ( std :: move ( buf )); assert ( a == b ); 
Benefits:
- 
     By being move-only, we would not have the risk of copying types with trap representations (thereby triggering UB). 
- 
     Uninitialized data is only accessible from the storage_buffer basic_string storage_buffer basic_string commit basic_string 
Drawbacks:
- 
     storage_buffer string vector 
- 
     basic_string storage_buffer 
6.4. Externally-Allocated Buffer Injection
In [P1072R1], we considered that 
- 
     Not critical 
- 
     Not constrained in the future 
- 
     Overly constraining to implementers. Allowing users to provide their own buffers runs into the "offset problem". Consider an implementation that stores its size capacity sizeof ( container ) == sizeof ( void * ) class container { struct Rep { size_t size ; size_t capacity ; }; Rep * rep_ ; }; If using a Rep 
- 
     Introducing new pitfalls. It would be easy to mix new [] allocator :: allocate 
7. Related Work
7.1. Google
Google has a local extension to 
- 
     [Abseil] uses this to avoid bookkeeping overheads in StrAppend StrCat 
- 
     [Snappy] - 
       In decompression, the final size of the output buffer is known before the contents are ready. 
- 
       During compression, an upperbound on the final compressed size is known, allowing data to be efficiently added to the output buffer (eliding append 
 
- 
       
- 
     [Protobuf] avoids extraneous copies or initialization when the size is known before data is available (especially during parsing or serialization). 
7.2. MongoDB
MongoDB has a string builder that could have been implemented in terms of 
E.g.: https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/unicode/string.h
/** * Strips diacritics and case-folds the utf8 input string, as needed to support * options. * * The options field specifies what operations to *skip*, so kCaseSensitive * means to skip case folding and kDiacriticSensitive means to skip diacritic * striping. If both flags are specified, the input utf8 StringData is returned * directly without any processing or copying. * * If processing is performed, the returned StringData will be placed in * buffer. buffer’s contents (if any) will be replaced. Since we may return * the input unmodified the returned StringData’s lifetime is the shorter of * the input utf8 and the next modification to buffer. The input utf8 must not * point into buffer. */ static StringData caseFoldAndStripDiacritics ( StackBufBuilder * buffer , StringData utf8 , SubstrMatchOptions options , CaseFoldMode mode ); 
(Comments re-wrapped.)
7.3. VMware
VMware has an internal string builder implementation that avoids 
7.4. Discussion on std-proposals
This topic was discussed in 2013 on std-proposals in a thread titled "Add
basic_string::resize_uninitialized (or a similar mechanism)":
  https://groups.google.com/a/isocpp.org/forum/#!topic/std-proposals/XIO4KbBTxl0
7.5. DynamicBuffer
The [N4734] (the Networking TS) has dynamic buffer types.
7.6. P1020R1
See also [P1020R1] "Smart pointer creation functions for default initialization". Adopted in San Diego.
7.7. Boost
Boost provides a related optimization for vector-like containers, introduced in [SVN r85964] by Ion Gaztañaga.
E.g.: boost/container/vector.hpp:
//! <b>Effects</b>: Constructs a vector that will use a copy of allocator a //! and inserts n default initialized values. //! //! <b>Throws</b>: If allocator_type’s allocation //! throws or T’s default initialization throws. //! //! <b>Complexity</b>: Linear to n. //! //! <b>Note</b>: Non-standard extension vector ( size_type n , default_init_t ); vector ( size_type n , default_init_t , const allocator_type & a ) ... void resize ( size_type new_size , default_init_t ); ... 
These optimizations are also supported in Boost Container’s 
7.8. Thrust
The Thrust library has "a RAII-type 
E.g. thrust/thrust/detail/temporary_array.inl:
template < typename T , typename TemporaryArray , typename Size > __host__ __device__ typename thrust :: detail :: disable_if < avoid_initialization < T >:: value >:: type construct_values ( TemporaryArray & a , Size n ) { a . default_construct_n ( a . begin (), n ); } // end construct_values() 
8. Wording
Wording is relative to [N4892].
Motivation for some of these edits can be found in § 5.2 Allocator Interaction.
8.1. [version.syn]
In [version.syn], add:
#define __cpp_lib_string_resize_and_overwrite YYYYMML // also in <string>
Adjust the placeholder value as needed so as to denote this proposal’s date of adoption.
8.2. [basic.string.general]
In [basic.string.general], in the synopsis, add 
...
    // 21.3.3.5, capacity
    constexpr size_type size() const noexcept;
    constexpr size_type length() const noexcept;
    constexpr size_type max_size() const noexcept;
    constexpr void resize(size_type n, charT c);
    constexpr void resize(size_type n);
    template<class Operation>
    constexpr void resize_and_overwrite(size_type n, Operation op);
    constexpr size_type capacity() const noexcept;
    constexpr void reserve(size_type res_arg);
    constexpr void shrink_to_fit();
    constexpr void clear() noexcept;
    [[nodiscard]] constexpr bool empty() const noexcept;
   
   8.3. [string.require]
Add a note to [string.require]:
In every specialization
, the typebasic_string < charT , traits , Allocator > shall name the same type asallocator_traits < Allocator >:: value_type . Every object of typecharT uses an object of typebasic_string < charT , traits , Allocator > to allocate and free storage for the containedAllocator objects as needed. ThecharT object used is obtained as described in 22.2.1. In every specializationAllocator , the typebasic_string < charT , traits , Allocator > shall meet the character traits requirements (21.2).traits [Note 1: Every specialization
is an allocator-aware container, but does not use the allocator’sbasic_string < charT , traits , Allocator > andconstruct member functions. ([container.requirements.general]). — end note]destroy [*Note
12 :* The program is ill-formed ifis not the same type as charT. — end note]traits :: char_type - References, pointers, and iterators referring to the elements of a
sequence may be invalidated by the following uses of thatbasic_string object:basic_string 
8.4. [string.capacity]
Note that [string.require] p4 has blanket wording for mutating operations, and [LWG2414] introduces blanket wording for reentrancy.
In [string.capacity]:
constexpr void resize(size_type n, charT c);
- Effects: Alters the value of
as follows:* this 
- — If
, erases the lastn <= size () elements.size () - n - — If
, appendsn > size () copies ofn - size () .c constexpr void resize(size_type n);
- Effects: Equivalent to
.resize ( n , charT ()) template<class Operation> constexpr void resize_and_overwrite(size_type n, Operation op);
- Let
- —
before the call too = size () .resize_and_overwrite - —
bek .min ( o , n ) - —
be ap , such that the rangecharT * is valid and[ p , p + n ] isthis -> compare ( 0 , k , p , k ) == 0 truebefore the call. The values in the rangemay be indeterminate [basic.indet].[ p + k , p + n ] - —
be the expresionOP .std :: move ( op )( p , n ) - —
=r .OP - Mandates:
has an integer-like type ([iterator.concept.winc]).OP - Preconditions:
- —
does not throw an exception or modifyOP orp .n - —
≥r .0 - —
≤r .n - — After evaluating
there are no indeterminate values in the rangeOP .[ p , p + r ) - Effects: Evaluates
, replaces the contents ofOP with* this , and invalidates all pointers and references to the range[ p , p + r ) .[ p , p + n ] - Recommended practice: Implementations should avoid unnecessary copies and allocations by, for example, making
a pointer into internal storage and by restoringp to* ( p + r ) after evaluatingcharT () .OP constexpr size_type capacity() const noexcept;...
8.5. [container.requirements.general]
In [container.requirements.general] p3 provide an explicit
exception for 
For the components affected by this subclause that declare anAllocator-aware containers (Table 76) other than, objects stored in these components shall be constructedallocator_type construct elements using the functionbasic_string andallocator_traits < allocator_type >:: rebind_traits < U >:: construct destroyeddestroy elements using the function[allocator.traits.members] (20.10.8.3), whereallocator_traits < allocator_type >:: rebind_traits < U >:: destroy is eitherU or an internal type used by the container. These functions are called only for the container’s element type, not for internal types used by the container.allocator_type :: value_type 
We provide a similar exception for 
Given an allocator type
and given a container typeA having aX identical tovalue_type and anT identical toallocator_type and given an lvalueallocator_traits < A >:: rebind_alloc < T > of typem , a pointerA of typep , an expressionT * of type (possiblyv )const , and an rvalueT of typerv , the following terms are defined. IfT is not allocator-aware or is a specialization ofX , the terms below are defined as ifbasic_string wereA — no allocator object needs to be created and user specializations ofallocator < T > are not instantiated:allocator < T > 
9. History
9.1. R9 → R10
- 
     In [string.capacity] - 
       Wrapped op std :: move OP 
- 
       Add Precondition that OP p n 
- 
       Removed requirements for allocation and capacity (added in R9). 
 
- 
       
9.2. R8 → R9
- 
     Rebased on [N4892]. 
- 
     In [string.capacity] - 
       Added a pointer to [LWG2414]. 
- 
       Reworded in terms of Let, Mandates, Preconditions, Effects, and Recommended practice. 
- 
       Added requirements for allocation and capacity. 
 
- 
       
- 
     In [container.requirements.general] - 
       Use simple (wording) exceptions for basic_string. 
- 
       Use "allocator-aware". 
- 
       Use "element" instead of "object". 
 
- 
       
- 
     Add only Note in [string.require], removing normative changes. 
- 
     Wording typos: typename -> class. 
- 
     Fix markdown glitch in R8 history. 
9.3. R7 → R8
LEWG reviewed R7 via [April2021Telecon]
POLL: Modify P1072R7 () by adding a feature test macro, and send revised paper to electronic polling to be forwarded to LWG with ship vehicle C++23, and priority B2.basic_string :: resize_and_overwrite 
SF F N A SA 9 11 1 0 0 Attendance: 31
Number of Authors: 2
Author Position: Strongly Favor
Outcome: Strong consensus in favor of sending to electronic polling.
LEWG did electronic polling via [D2384R0]
Poll 7: Modify P1072R7 () by adding a feature test macro, and then send the revised paper to Library Working Group for C++23, classified as an improvement of an existing feature ([P0592R4] bucket 2 item).basic_string :: resize_and_overwrite 
SF F N A SA 17 8 0 1 1 Outcome: Consensus in favor.
- 
     Added LEWG poll from [April2021Telecon] 
- 
     Added feature test macro. 
- 
     Fixed ambiguous indentation in string.capacity Effects. 
- 
     Further fixes for CompressWrapper example. 
- 
     Rebased on [N4885]. 
- 
     Added LEWG electronic polling results from [D2384R0]. 
9.4. R6 → R7
- 
     Rebased on [N4878]. 
- 
     Fixed CompressWrapper example. 
- 
     Allowed writing one past the end. 
- 
     Reorganized § 5.3 Undefined Behavior. 
9.5. R5 → R6
LEWG reviewed R5 via [LEWGNovTelecon].
POLL:should be able to work in constexpr context.resize_default_init No objection to unanimous consent. Attendance: 23
POLL: We prefer UB over throwing an exception if the operation reports it wrote more bytes than it has access to.No objection to unanimous consent. Attendance: 21
POLL: We want to renamefor consistency, in a mailing list review.resize_default_init 
SF F N A SA 3 12 0 1 0 
Modifications since R5:
- 
     Indicated poll outcomes for UB when op 
- 
     Renamed to resize_and_overwrite std :: make_unique_default_init std :: make_unique_for_overwrite 
- 
     Applied constexpr 
9.6. R4 → R5
A draft of this revision was presented to LEWG at the [Cologne] meeting.
Forward to LWG for C++23
SF F N A SA 10 5 1 0 0 
- 
     Rebased on [N4830]. 
- 
     Proposed UB instead of throwing std :: range_error 
9.7. R3 → R4
- 
     Applied feedback from the [KOA] meeting. 
- 
     Moved to using a callback routine, rather than leaving invariants weakened from the time resize_default_init 
9.8. R2 → R3
- 
     Applied Jonathan Wakely’s editorial comments on § 8 Wording. 
- 
     Rebased on [N4791]. 
- 
     Editorial changes to § 1 Motivation and § 6.1 Tag Type to support our design choice. 
- 
     Added the reference to [LIBC++] in § 3 Implementation. 
9.9. R1 → R2
Applied feedback from [SAN] Meeting reviews.
- 
     Reverted design to "option A" proposed in [P1072R0]. 
- 
     Switched from resize_uninitialized resize_default_init 
- 
     Added discussion of alternatives considered. 
- 
     Specified allocator interaction. 
- 
     Added wording. 
9.10. R0 → R1
Applied feedback from LEWG [post-Rapperswil] Email Review:
- 
     Shifted to a new vocabulary types: storage_buffer storage_node - 
       Added presentation of storage_buffer 
- 
       Added presentation of storage_node 
 
- 
       
- 
     Added discussion of design and implementation considerations. 
- 
     Most of [P1010R1] Was merged into this paper. 
10. Acknowledgments
Thanks go to Arthur O’Dwyer for help with wording and proof
reading, to Jonathan Wakely for hunting down the language that
makes