"Привет, κόσμος!" 
 ― anonymous user 
1. Introduction
A new I/O-agnostic text formatting library was introduced in C++20 ([FORMAT]). This paper proposes integrating it with standard I/O facilities via a simple and intuitive API achieving the following goals:
- 
     Usability 
- 
     Unicode support 
- 
     Good performance 
- 
     Small binary footprint 
2. Revision history
Changes since R8:
- 
     Added new SG16 poll results. 
- 
     Improved wording for [print.syn].6 (previously [print.syn].31) to remove ambiguities per SG16 feedback. 
- 
     Fixed paragraph numbering in the wording. 
- 
     Clarified the difference from the Python’s print 
Changes since R7:
- 
     Added a reference to LLVM’s raw_ostream 
Changes since R6:
- 
     Added new SG16 poll results. 
- 
     Rebased the wording onto the latest draft, most importantly adding compile-time checks introduced by [P2216]. 
- 
     Added "If out vprint_unicode 
- 
     Replaced "invalid code points are substituted with U+FFFD � REPLACEMENT CHARACTER" with " implementations should substitute invalid code units with U+FFFD � REPLACEMENT CHARACTER per The Unicode® Standard Version 13.0 – Core Specification, Chapter 3.9" in § 16 Wording per SG16 feedback. 
- 
     Added "The Unicode® Standard Version 13.0 – Core Specification" to Normative references in § 16 Wording. 
- 
     Clarified the behavior when mixing encodings in § 11 Unicode. 
Changes since R5:
- 
     Added new LEWG poll results. 
- 
     Added new SG16 poll results. 
- 
     Replaced < io > < print > 
- 
     Clarified the choice of U+FFFD � REPLACEMENT CHARACTER for transcoding errors in § 11 Unicode. 
- 
     Clarified the choice of literal encoding in § 11 Unicode. 
- 
     Clarified that ANSI escape codes for specifying coding systems are not considered a native system API that supports Unicode in § 11 Unicode. 
- 
     Added a reference to Rust’s standard output facility that implements the same mojibake prevention mechanism to § 15 Implementation 
Changes since R4:
- 
     Added SG16 Unicode poll results. 
- 
     Added a list of candidate headers formatted output functions can be added to. 
- 
     Moved the non- ostream 
Changes since R3:
- 
     Replaced _isatty ( _fileno ( stream )) GetConsoleMode ( _get_osfhandle ( _fileno ( stream )), ...) 
Changes since R2:
- 
     Added better compatibility with other formatted I/O facilities as another advantage of using stdout 
- 
     Clarified that [P1885] can be used for literal encoding detection per SG16 feedback. 
- 
     Added comparison of Unicode handling in various languages in Appendix A: Unicode tests and a summary in § 11 Unicode per SG16 request. 
- 
     Removed incorrect "exposition-only" in § 11 Unicode. 
- 
     Replaced "both source and literal encodings are UTF-8, which is enabled by the /utf-8compiler flag" with "the literal (execution) encoding is UTF-8, which is enabled by the/execution-charset:utf-8compiler flag" since the source encoding is irrelevant there.
- 
     Rephrased Effects of vprint_unicode 
Changes since R1:
- 
     Added missing println FILE * ostream & 
- 
     Moved the print functions that take ostream & < ostream > < format > < ostream > 
- 
     Clarified why it is useful to provide vprint * 
- 
     Rebased the wording onto the latest working draft, N4861, in particular updating the Throws clauses to match existing wording. 
- 
     Replaced std :: system_error system_error 
- 
     Added paragraph numbers to the wording. 
Changes since R0:
- 
     Clarified that adding wchar_t 
3. SG16 polls (R8)
Poll: Use of UTF-8 as the literal encoding is sufficient for 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 2 | 3 | 2 | 1 | 0 | 
Consensus: Consensus in favour.
Poll: Correct the P2093R8 wording for [print.syn].31 to remove ambiguities, and forward P2093 as revised to LEWG with a recommended ship vehicle of C++23.
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 1 | 4 | 2 | 0 | 0 | 
Consensus: Consensus in favour.
4. SG16 polls (R6)
Poll: When 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 3 | 3 | 1 | 2 | 0 | 
Outcome: Consensus for the position
Poll: When 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 1 | 0 | 2 | 2 | 3 | 
Outcome: Consensus against the direction.
Poll: 
No objection to unanimous consent.
Poll: formatters should not be sensitive to whether they are being used with a 
No objection to unanimous consent.
Poll: Regardless of format string encoding assumptions, 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 5 | 1 | 1 | 0 | 0 | 
Consensus: Strong consensus in favor.
Poll: Regardless of format string encoding assumptions, 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 2 | 1 | 3 | 1 | 0 | 
Consensus: Weak consensus in favor.
Poll: 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 2 | 4 | 0 | 0 | 1 | 
Consensus: Strong consensus in favor.
Poll: 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 6 | 0 | 1 | 0 | 0 | 
Consensus: Stronger consensus in favor relative to previous poll.
Poll: 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 4 | 0 | 2 | 1 | 0 | 
Consensus: Consensus in favor.
Poll: 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 0 | 1 | 1 | 5 | 1 | 
Consensus: Strong consensus against.
Poll: 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 2 | 5 | 0 | 0 | 1 | 
Consensus: Consensus in favor.
Poll: 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 0 | 0 | 3 | 3 | 3 | 
Poll: Use of UTF-8 as the literal encoding is sufficient for 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 3 | 1 | 3 | 2 | 0 | 
Consensus: Very weak consensus.
5. LEWG polls (R5)
Poll: Block P2093 until we have a proposal for a lower level facility that can query tty/console and perform direct output to the console
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 2 | 1 | 5 | 5 | 6 | 
Outcome: Consensus Against
Poll: We want 
| Header | Approve | Disapprove | 
|---|---|---|
|  | 3 | 7 | 
|  | 6 | 5 | 
|  | 10 | 3 | 
|  | 12 | 1 | 
|  | 1 | 13 | 
Outcome: both 
6. SG16 polls (R3)
Poll: Forward P2093R3 to LEWG.
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 4 | 2 | 2 | 0 | 1 | 
Consensus? Yes
7. LEWG polls (R2)
Poll: We want P2093R2 to revert moving the 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 0 | 0 | 6 | 6 | 1 | 
Outcome: Keep the paper as is
8. LEWG polls (R1)
Poll: We prefer 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 3 | 2 | 6 | 6 | 5 | 
Poll: Add a member function on 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 0 | 2 | 5 | 17 | 3 | 
Consensus against.
Poll: Remove 
| SF | F | N | A | SA | 
|---|---|---|---|---|
| 1 | 10 | 7 | 4 | 4 | 
No consensus for change.
Poll: We are happy with the design with regards to UTF-8 output.
Unanimous consent.
Attendance: 35
9. Motivating examples
Consider a common task of printing formatted text to 
| C++20 | Proposed | 
|---|---|
|  |  | 
The proposed 
Existing alternatives in C++20:
| Code | Comments | 
|---|---|
|  | Requires even more formatted I/O function calls; message is interleaved with parameters; can result in interleaved output. | 
|  | Only works if is a null-terminated character string. | 
|  | Constructs a temporary string; requires a call to and a separate
    I/O function call, although potentially cheaper than. | 
Another problem is formatting of Unicode text:
If the source and execution encoding is UTF-8 this will produce the expected output on most GNU/Linux and macOS systems. Unfortunately on Windows it is almost guaranteed to produce mojibake despite the fact that the system is fully capable of printing Unicode, for examplestd :: cout << "Привет, κόσμος!" ; 
Приветeven when compiled with, κόσμος! 
/utf-8 using Visual C++
([MSVC-UTF8]). This happens because the terminal assumes code page 437 in this
case independently of the execution encoding. 
   With the proposed paper
will printstd :: ( "Привет, κόσμος!" ); 
"Привет, κόσμος!" >>> ( "Привет, κόσμος!" ) Привет, κόσμος! 
This problem is independent of formatting 
10. API and naming
Many programming languages provide functions for printing text to standard output, often combined with formatting:
| Language | Function(s) | 
|---|---|
| C | [N2176] | 
| C#/.NET | [DOTNET-WRITE] | 
| COBOL | statement [N0147] | 
| Fortran | andstatements [N2162] | 
| Go | [GO-FMT] | 
| Java | ,,[JAVA-PRINT] | 
| JavaScript | [WHATWG-CONSOLE] | 
| Perl | [PERL-PRINTF] | 
| PHP | [PHP-PRINTF] | 
| Python | statement or function [PY-FUNC] | 
| R | [R-PRINT] | 
| Ruby | and[RUBY-PRINT] | 
| Rust | [RUST-PRINT] | 
| Swift | [SWIFT-PRINT] | 
Variations of 
We propose adding a free function called 
- 
     stdout 
- 
     Better compatibility with other formatted I/O facilities compared to std :: cout std :: streambuf 
- 
     print ostream 
In some languages like Python 
Since 
Another option is to make 
A free function can also be overloaded to takestd :: cout . ( "Hello, {}!" , name ); 
FILE * printf There are multiple approaches to appending a trailing newline:
- 
     Don’t append a newline automatically: printf 
- 
     Append a newline but don’t format arguments: puts fputs 
- 
     Have two formatting functions/macros, one that appends newline and another that doesn’t: print println print ! println ! Printf Println Write WriteLine 
- 
     Let the user choose a terminating string defaulting to " \n " print 
We propose not appending a newline automatically for consistency with 
std :: ( "Hello, {}!" , name ); // doesn’t print a newline std :: ( "Hello, {}! \n " , name ); // prints a newline 
Additionally we can provide a function that appends a newline:
std :: println ( "Hello, {}!" , name ); // prints a newline 
Although 
Another question is which header non-
- 
     < io > 
- 
     < print > 
- 
     < format > 
- 
     < ostream > 
- 
     < utility > 
Earlier versions of the paper proposed 
% echo '#include < ostream > '| clang ++ - E - x c ++ - | wc - l 42491 
It also pulls in a lot of unrelated symbols such as 
11. Unicode
We can prevent mojibake in the Unicode example by detecting if the string literal encoding is UTF-8 and dispatching to a different function that correctly handles Unicode, for example:
where theconstexpr bool is_utf8 () { const unsigned char micro [] = " \u00B5 " ; return sizeof ( micro ) == 3 && micro [ 0 ] == 0xC2 && micro [ 1 ] == 0xB5 ; } template < typename ... Args > void ( string_view fmt , const Args & ... args ) { if ( is_utf8 ()) vprint_unicode ( fmt , make_format_args ( args ...)); else vprint_nonunicode ( fmt , make_format_args ( args ...)); } 
vprint_unicode vprint_nonunicode print prints#include <stdio.h>int main () { puts ( " \xc3\x28 " ); // Invalid 2 Octet Sequence } 
�(  in iTerm2 and ? ( In Visual C++ true if the literal (execution) encoding
is UTF-8, which is enabled by the /execution-charset:utf-8 compiler flags or other means, and false otherwise. Literal encoding detection
can be implemented in a more elegant way using [P1885].
Note that ANSI escape codes for specifying coding systems ([ISO2022]) are not considered a native system API that supports Unicode for the purposes of this proposal.
We propose using the literal encoding for the following reasons:
- 
     Consistency with the design of std :: format std :: format 
- 
     Consistency with the encoding used for width estimation ([P1868]). The standard wording doesn’t mention the literal encoding explicitly but the fact that the format strings are either literals or other compile-time strings ([P2216]) makes it the only conformant option. 
- 
     Safety: the result of formatted_size format_to 
- 
     Implementation and usage experience. 
- 
     In the vast majority of cases format strings are literals. For example, analyzing a sample of 100 printf _ 
- 
     The active code page and the terminal encoding being unrelated on popular Windows localizations such as Russian where the former is CP1251 while the latter is CP866. Instead of assuming one encoding regardless of the string origin which would often result in mojibake, an explicit encoding indication can be done via the standard extension API, e.g. (exposition only) 
 This is already possible to implement by providing appropriateprint ( "Привет, {}!" , locale_enc ( string_in_locale_encoding )); std :: formatter 
This approach has been implemented in the fmt library ([FMT]), successfully tested and used on a variety of platforms.
Users can sometimes restrict the set of used characters to the common subset among multiple encodings (often ASCII) in which case encoding becomes mostly irrelevant. Such "polyglot" strings are fully supported for legacy encodings and partially supported for UTF-8 by the current proposal even though mixing encodings in such a way is a clearly bad practice from a general software engineering point of view.
Here’s an example output on Windows:
At the same time interoperability with legacy code is preserved when literal
encoding is not UTF-8. In particular, in case of EBCDIC, Shift JIS or a
non-Unicode Windows code page, 
The following table summarizes the behavior of formatted output facilities in different programming languages:
| Linux | macOS | Windows | ||||
|---|---|---|---|---|---|---|
| Language | Terminal | Redirect | Terminal | Redirect | Terminal | Redirect | 
| C | Correct | UTF-8 | Correct | UTF-8 | Wrong | UTF-8 | 
| Go | Correct | UTF-8 | Correct | UTF-8 | Correct | UTF-8 | 
| Java | Correct | UTF-8* | Correct | UTF-8* | Wrong | CP1251 (lossy) | 
| JavaScript | Correct | UTF-8* | Correct | UTF-8* | Correct | UTF-8* | 
| Python | Correct | UTF-8* | Correct | UTF-8* | Correct | Error | 
| Rust | Correct | UTF-8 | Correct | UTF-8 | Correct | UTF-8 | 
* - the output is transcoded from a different UTF representation.
Correct means that the test message "Привет, κόσμος!" was fully readable in the
terminal output. None of the tested language facilities were able to produce
readable output when piped through the standard 
The current paper proposes following C, Go, JavaScript and Rust and preserving
the original encoding (modulo UTF conversion). The only difference compared to 
- 
     There is a silent data loss for valid Unicode code points when the output is redirected to a file. 
- 
     It is more expensive because of transcoding. 
- 
     It may give an unusable result when piped through standard Windows commands like findstr 
- 
     It transcodes into legacy encodings that are rarely used in practice nowadays. For example, usage of CP1251 dropped from 4.3% to 0.8% in the last 12+ years ([ENCODING-TRENDS]), including a 0.1% drop while the current paper was in review. 
The full listings of test programs are given in Appendix A: Unicode tests.
12. Performance
All the performance benefits of 
The following benchmark compares the reference implementation of 
#include <cstdio>#include <iostream>#include <benchmark/benchmark.h>#include <fmt/ostream.h>void printf ( benchmark :: State & s ) { while ( s . KeepRunning ()) std :: printf ( "The answer is %d. \n " , 42 ); } BENCHMARK ( printf ); void ostream ( benchmark :: State & s ) { std :: ios :: sync_with_stdio ( false); while ( s . KeepRunning ()) std :: cout << "The answer is " << 42 << ". \n " ; } BENCHMARK ( ostream ); void ( benchmark :: State & s ) { while ( s . KeepRunning ()) fmt :: ( "The answer is {}. \n " , 42 ); } BENCHMARK ( ); void print_cout ( benchmark :: State & s ) { std :: ios :: sync_with_stdio ( false); while ( s . KeepRunning ()) fmt :: ( std :: cout , "The answer is {}. \n " , 42 ); } BENCHMARK ( print_cout ); void print_cout_sync ( benchmark :: State & s ) { std :: ios :: sync_with_stdio ( true); while ( s . KeepRunning ()) fmt :: ( std :: cout , "The answer is {}. \n " , 42 ); } BENCHMARK ( print_cout_sync ); BENCHMARK_MAIN (); 
The benchmark was compiled with Apple clang version 11.0.0 (clang-1100.0.33.17)
with 
Run on (8 X 2800 MHz CPU s) CPU Caches: L1 Data 32K (x4) L1 Instruction 32K (x4) L2 Unified 262K (x4) L3 Unified 8388K (x1) Load Average: 1.83, 1.88, 1.82 ---------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------- printf 87.0 ns 86.9 ns 7834009 ostream 255 ns 255 ns 2746434 print 78.4 ns 78.3 ns 9095989 print_cout 89.4 ns 89.4 ns 7702973 print_cout_sync 91.5 ns 91.4 ns 7903889
Both 
On Windows 10 with Visual C++ 2019 the results are similar although the
difference between 
Run on (1 X 2808 MHz CPU ) CPU Caches: L1 Data 32K (x1) L1 Instruction 32K (x1) L2 Unified 262K (x1) L3 Unified 8388K (x1) ---------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------- printf 835 ns 816 ns 746667 ostream 2410 ns 2400 ns 280000 print 580 ns 572 ns 1120000 print_cout 623 ns 614 ns 1120000 print_cout_sync 615 ns 614 ns 1120000
13. Binary code
We propose minimizing per-call binary code size by applying the type erasure
mechanism from [P0645]. In this approach all the formatting and printing logic
is implemented in a non-variadic function 
void vprint ( string_view fmt , format_args args ); template < class ... Args > inline void ( string_view fmt , const Args & ... args ) { return vprint ( fmt , make_format_args ( args ...)); } 
We provide 
void vlog ( log_level level , string_view fmt , format_args args ) { // Print the log level and use vprint* overloads to format and print the // message. } template < class ... Args > inline void log ( log_level level , string_view fmt , const Args & ... args ) { return vlog ( level , fmt , make_format_args ( args ...)); } 
Here 
Below we compare the reference implementation of -O3 -DNDEBUG -c -std=c++17  and the resulting binaries are disassembled
with objdump -S:
void printf_test ( const char * name ) { printf ( "Hello, %s!" , name ); } 
__Z11printf_testPKc:
       0:       55      pushq   %rbp
       1:       48 89 e5        movq    %rsp, %rbp
       4:       48 89 fe        movq    %rdi, %rsi
       7:       48 8d 3d 08 00 00 00    leaq    8(%rip), %rdi
       e:       31 c0   xorl    %eax, %eax
      10:       5d      popq    %rbp
      11:       e9 00 00 00 00  jmp     0 <__Z11printf_testPKc+0x16>
void ostream_test ( const char * name ) { std :: cout << "Hello, " << name << "!" ; } 
__Z12ostream_testPKc:
       0:       55      pushq   %rbp
       1:       48 89 e5        movq    %rsp, %rbp
       4:       41 56   pushq   %r14
       6:       53      pushq   %rbx
       7:       48 89 fb        movq    %rdi, %rbx
       a:       48 8b 3d 00 00 00 00    movq    (%rip), %rdi
      11:       48 8d 35 6c 03 00 00    leaq    876(%rip), %rsi
      18:       ba 07 00 00 00  movl    $7, %edx
      1d:       e8 00 00 00 00  callq   0 <__Z12ostream_testPKc+0x22>
      22:       49 89 c6        movq    %rax, %r14
      25:       48 89 df        movq    %rbx, %rdi
      28:       e8 00 00 00 00  callq   0 <__Z12ostream_testPKc+0x2d>
      2d:       4c 89 f7        movq    %r14, %rdi
      30:       48 89 de        movq    %rbx, %rsi
      33:       48 89 c2        movq    %rax, %rdx
      36:       e8 00 00 00 00  callq   0 <__Z12ostream_testPKc+0x3b>
      3b:       48 8d 35 4a 03 00 00    leaq    842(%rip), %rsi
      42:       ba 01 00 00 00  movl    $1, %edx
      47:       48 89 c7        movq    %rax, %rdi
      4a:       5b      popq    %rbx
      4b:       41 5e   popq    %r14
      4d:       5d      popq    %rbp
      4e:       e9 00 00 00 00  jmp     0 <__Z12ostream_testPKc+0x53>
      53:       66 2e 0f 1f 84 00 00 00 00 00   nopw    %cs:(%rax,%rax)
      5d:       0f 1f 00        nopl    (%rax)
void print_test ( const char * name ) { ( "Hello, {}!" , name ); } 
__Z10print_testPKc:
       0:	55 	pushq	%rbp
       1:	48 89 e5 	movq	%rsp, %rbp
       4:	48 83 ec 10 	subq	$16, %rsp
       8:	48 89 7d f0 	movq	%rdi, -16(%rbp)
       c:	48 8d 3d 19 00 00 00 	leaq	25(%rip), %rdi
      13:	48 8d 4d f0 	leaq	-16(%rbp), %rcx
      17:	be 0a 00 00 00 	movl	$10, %esi
      1c:	ba 0d 00 00 00 	movl	$13, %edx
      21:	e8 00 00 00 00 	callq	0 <__Z10print_testPKc+0x26>
      26:	48 83 c4 10 	addq	$16, %rsp
      2a:	5d 	popq	%rbp
      2b:	c3 	retq
   The code generated for the 
The following factors contribute to the difference in binary code size between 
- 
     Passing format string as string_view const char * 
- 
     Capturing and passing argument type information. 
- 
     Preparing the array of formatting arguments. 
14. Impact on existing code
The current proposal adds new functions to the headers 
15. Implementation
The proposed 
Rust’s standard output facility uses essentially the same approach for preventing mojibake when printing to console on Windows ([RUST-STDIO]). The main difference is that invalid code units are reported as errors in Rust.
LLVM’s 
16. Wording
Add an entry for 
#define __cpp_lib_print 202005L **placeholder** // also in <format> 
Add the header 
< print > namespace std { template < class ... Args > void ( format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > void ( FILE * stream , format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > void println ( format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > void println ( FILE * stream , format - string < Args ... > fmt , const Args & ... args ); void vprint_unicode ( string_view fmt , format_args args ); void vprint_unicode ( FILE * stream , string_view fmt , format_args args ); void vprint_nonunicode ( string_view fmt , format_args args ); void vprint_nonunicode ( FILE * stream , string_view fmt , format_args args ); } 
Modify section "Header 
29.7.? Print functions [print.fun].... template < class charT , class traits , class T > basic_ostream < charT , traits >& operator << ( basic_ostream < charT , traits >&& os , const T & x ); template < class ... Args > void ( ostream & os , format - string < Args ... > fmt , const Args & ... args ); template < class ... Args > void println ( ostream & os , format - string < Args ... > fmt , const Args & ... args ); void vprint_unicode ( ostream & os , string_view fmt , format_args args ); void vprint_nonunicode ( ostream & os , string_view fmt , format_args args ); 
template < class ... Args > void ( format - string < Args ... > fmt , const Args & ... args ); 
1 Effects: Equivalent to:
( stdout , fmt , make_format_args ( args ...)); 
template < class ... Args > void ( FILE * stream , format - string < Args ... > fmt , const Args & ... args ); 
2 Effects: If string literal encoding is UTF-8, equivalent to:
Otherwise, equivalent to:vprint_unicode ( stream , fmt . str , make_format_args ( args ...)); 
vprint_nonunicode ( stream , fmt . str , make_format_args ( args ...)); 
template < class ... Args > void println ( format - string < Args ... > fmt , const Args & ... args ); 
3 Effects: Equivalent to:
( "{} \n " , format ( fmt , args ...)); 
template < class ... Args > void println ( FILE * stream , format - string < Args ... > fmt , const Args & ... args ); 
4 Effects: Equivalent to:
( stream , "{} \n " , format ( fmt , args ...)); 
void vprint_unicode ( string_view fmt , format_args args ); 
5 Effects: Equivalent to:
vprint_unicode ( stdout , fmt , args )); 
6 Effects: Letvoid vprint_unicode ( FILE * stream , string_view fmt , format_args args ); 
out  =  vformat ( fmt ,  args ) stream out out out stream 
     [ Note: On POSIX and Windows, 
     [ Note: On Windows, the native Unicode API is 
     7 Throws: As specified in [format.err.report]
or 
void vprint_nonunicode ( string_view fmt , format_args args ); 
8 Effects: Equivalent to:
vprint_nonunicode ( stdout , fmt , args )); 
9 Effects: Writes the result ofvoid vprint_nonunicode ( FILE * stream , string_view fmt , format_args args ); 
vformat ( fmt ,  args ) stream 
     10 Throws: As specified in [format.err.report]
or 
Add subsection "Print [ostream.formatted.print]" to "Formatted output functions [ostream.formatted]":
template < class ... Args > void ( ostream & os , format - string < Args ... > fmt , const Args & ... args ); 
1 Effects: If string literal encoding is UTF-8, equivalent to:
Otherwise, equivalent to:vprint_unicode ( os , fmt , make_format_args ( args ...)); 
vprint_nonunicode ( os , fmt , make_format_args ( args ...)); 
2 Effects: Letvoid vprint_unicode ( ostream & os , string_view fmt , format_args args ); 
out  =  vformat ( os . getloc (),  fmt ,  args ) os basic_filebuf out out stream 
     Throws: As specified in [format.err.report]
or 
3 Effects: Writes the result ofvoid vprint_nonunicode ( ostream & os , string_view fmt , format_args args ); 
vformat ( os . getloc (),  fmt ,  args ) os 
     Throws: As specified in [format.err.report]
or 
Add to Normative references [intro.refs]:
– The Unicode® Standard Version 13.0 – Core SpecificationAppendix A: Unicode tests
This appendix gives full listings of programs for testing Unicode handling in various formatting facilities as well as test commands and their output on different platforms. The code contains additional sanity checks to ensure that the strings are encoded in some form of UTF as opposed to a legacy encoding.
C (
#include <stdio.h>#include <stdlib.h>int main () { const char * message = "Привет, κόσμος! \n " ; if (( unsigned char ) message [ 0 ] != 0xD0 && ( unsigned char ) message [ 1 ] != 0x9F ) abort (); printf ( message ); } 
Go (
package mainimport "fmt" import "log" func main() { var message= "Привет, κόσμος!" if message[ 0 ] != 0xD0 && message[ 1 ] != 0x9F { log. Fatal( "wrong encoding" ) } fmt. Println( message) } 
Java (
class Test { public static void main ( String [] args ) { String message = "Привет, κόσμος!\n" ; if ( message . charAt ( 0 ) != 0x41F ) throw new RuntimeException (); System . out . ( message ); } } 
JavaScript / Node.js (
message= "Привет, κόσμος!" ; if ( message. charCodeAt( 0 ) != 0x41F ) throw "wrong encoding" ; console. log( message); 
Python (
message = "Привет, κόσμος!" if ord( message [ 0 ]) != 0x41F : raise Exception () print( message ) 
Rust (
fn main () { if "Привет, κόσμος!" . chars (). nth ( 0 ). unwrap () as u32 != 0x41F { panic!(); } println!( "Привет, κόσμος!" ); } 
Linux:
$ cc test . c - o c - test $ . / c - test Привет, κόσμος! $ . / c - test > out - c - linux . txt $ go build - o go - test test . go $ . / go - test Привет, κόσμος! $ . / go - test > out - go - linux . txt $ java Test Привет, κόσμος! $ java Test > out - java - linux . txt $ node test . js Привет, κόσμος! $ node test . js > out - js - linux . txt $ python3 test . py Привет, κόσμος! $ python3 test . py > out - py - linux . txt $ rustc test . rs - o rust - test $ . / rust - test Привет, κόσμος! $ . / rust - test > out - rust - linux . txt 
All output files are in UTF-8:
Linux configuration:
- 
     Ubuntu Focal 20.04 with the ru_RU.UTF-8 locale 
- 
     cc: gcc 9.3.0 
- 
     go: go1.13.8 
- 
     java: openjdk 11.0.9.1 
- 
     node: v14.5.0 
- 
     python3: 3.7.5 
- 
     rustc: 1.47.0 
macOS:
% cc test . c - o c - test % . / c - test Привет, κόσμος! % . / c - test > out - c - macos . txt % go build - o test - go test . go % . / test - go Привет, κόσμος! % . / test - go > out - go - macos . txt % java Test Привет, κόσμος! % java Test > out - java - macos . txt % node test . js Привет, κόσμος! % node test . js > out - js - macos . txt % python3 test . py Привет, κόσμος! % python3 test . py > out - py - macos . txt % rustc test . rs - o rust - test % . / rust - test Привет, κόσμος! % . / rust - test > out - rust - macos . txt 
All output files are in UTF-8:
macOS configuration:
- 
     macOS Catalina 10.15.7 with the ru_RU.UTF-8 locale which is the default for Russian 
- 
     cc: Apple clang version 12.0.0 (clang-1200.0.32.27) 
- 
     go: go1.15.5 
- 
     java: openjdk 14.0.1 
- 
     node: v14.5.0 
- 
     python3: 3.7.5 
- 
     rustc: 1.47.0 
Windows:
> cl / Fe : c - test . exe test . c ... > c - test ╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В! > c - test > out - c - windows . txt > c - test | findstr , ╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В! > go build - o go - test . exe test . go > go - test Привет, κόσμος! > go - test > out - go - windows . txt > go - test | findstr , ╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В! > java Test Привет, ??????! > java Test > out - java - windows . txt > java Test | findstr , ╧ЁштхЄ, ??????! > node test . js Привет, κόσμος! > node test . js > out - js - windows . txt > node test . js | findstr , ╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В! > python test . py Привет, κόσμος! > python test . py > out - py - windows . txt Traceback ( most recent call last ) : File "... \t est.py" , line 4 , in < module > ( message ) File "...\Python39\lib\encodings\cp1251.py" , line 19 , in encode return codecs . charmap_encode ( input , self . errors , encoding_table )[ 0 ] UnicodeEncodeError : 'charmap 'codec can ’t encode characters in position 8-13 : character maps to < undefined > > python test . py | findstr , Traceback ( most recent call last ) : File "... \t est.py" , line 4 , in < module > ( message ) File "...\Python39\lib\encodings\cp1251.py" , line 19 , in encode return codecs . charmap_encode ( input , self . errors , encoding_table )[ 0 ] UnicodeEncodeError : 'charmap 'codec can ’t encode characters in position 8-13 : character maps to < undefined > > rustc test . rs - o rust - test . exe > rust - test Привет, κόσμος! > rust - test > out - rust - windows . txt > rust - test | findstr , ╨Я╤А╨╕╨▓╨╡╤В, ╬║╧М╧Г╬╝╬┐╧В! 
C, JavaScript (node.js), Rust and Go produced valid UTF-8 when the output was
redirected to files. Java produced a file in the legacy CP1251 encoding with 
Windows configuration:
- 
     Windows 10 10.0.19041 with Russian Region and Language settings 
- 
     cl: Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29335 
- 
     go: go1.15.6 
- 
     java: Java HotSpot(TM) 64-Bit Server VM (build 15.0.1+9-18) 
- 
     node: v14.15.3 
- 
     python: 3.9.1 
- 
     rustc: v14.15.3 
17. Acknowledgements
Thanks to Corentin Jabot for his work on text encodings in C++ and in particular [P1885] that will simplify implementation of the current proposal.
Thanks to Roger Orr, Peter Brett, Hubert Tong, the BSI C++ panel and Tom Honermann for their feedback, support, constructive criticism and contributions to the proposal.