SG16: Unicode meeting summaries 2021-06-09 through 2021-12-15
Summaries of SG16 meetings are maintained at
https://github.com/sg16-unicode/sg16-meetings.  This paper contains a
snapshot of select meeting summaries from that repository.
  - 
      June 9th, 2021
- 
      June 23rd, 2021
- 
      July 14th, 2021
- 
      July 28th, 2021
- 
      August 25th, 2021
- 
      September 8th, 2021
- 
      September 22nd, 2021
- 
      October 6th, 2021
- 
      October 20th, 2021
- 
      November 3rd, 2021
- 
      November 17th, 2021
- 
      December 1st, 2021
- 
      December 15th, 2021
Previously published SG16 meeting summary papers:
June 9th, 2021
Draft agenda:
  - P2093R6: Formatted output
    
      - Continue discussion and poll for consensus on answers to the
          following questions:
        
          - 1) How should invalidly encoded text be handled when transcoding
                 for the purpose of writing directly to a device interface?
- 2) Is use of UTF-8 as the literal encoding a sufficient indicator
                 that all input fed to std::format() and
                 std::print() (including the format string, programmer
                 supplied field arguments, and locale provided text) will be
                 UTF-8 encoded?
- 3) Is the literal encoding a sufficient indicator in general that
                 all input fed to std::format() and
                 std::print() (including the format string, programmer
                 supplied field arguments, and locale provided text) will be
                 provided in an encoding compatible with the literal
                 encoding?
- 4) What are the implications for future support of
                 std::print("{} {} {} {}", L"Wide text", u8"UTF-8 text", u"UTF-16 text", U"UTF-32 text")?
 
 
- LWG 3565: Handling of encodings in localized formatting of chrono types is underspecified
Attendees:
  - Charlie Barto
- Corentin Jabot
- Hubert Tong
- Jens Maurer
- Steve Downey
- Tom Honermann
- Victor Zverovich
- Zach Laine
Meeting summary:
  - P2093R6: Formatted output:
    
      - No initial discussion was held; the meeting proceded directly to
          candidate polls previously
          communicated to the mailing list.
- Poll 1 discussion:
        
          - Zach stated that programmers will expect std::format()
              and std::print() to behave the same way.
- Victor stated that std::print() can be implemented using
              std::format(); std::print() is intended to be
              just std::format() with additional device dependent
              transcoding.
 
- Poll 1: P2093R6: <format> and <print>
          facilities should have consistent behavior with respect to encoding
          expectations for the format string.
        
          - Attendance: 8
- No objection to unanimous consent.
 
- Poll 2 discussion:
        
          - [ Editor's note: the original poll was "P2093R6:
              <format> and <print> facilities
              should have consistent behavior with respect to encoding
              expectations for the output of formatters." ]
- Victor asked for confirmation that the "formatters" term in the
              poll refers to formatter specializations.
- Tom confirmed that it does.
- Zach asked for confirmation that formatters can be user
              provided.
- Victor confirmed that they can be.
- Hubert stated that a desire to bypass encoding constraints will
              require a concept for binary formatters and a corresponding
              proposal.
- Jens expressed a belief that formatters are allowed to be
              agnostic with respect to use with std::format() vs
              std::print().
- [ Editor's note: Jens observation prompted the addition of
              poll 2.2 to confirm matching design intent. ]
- Victor stated that there is currently no mechanism proposed for a
              formatter to be informed as to whether it is being used with
              std::format() or std::print().
- Zach expressed confusion about the poll.
- Hubert suggested this poll be deferred until after later polls
              concerned with the consequences of violating encoding
              expectations.
 
- Poll 2.1: P2093R6: <format> and
          <print> facilities should have consistent behavior
          with respect to encoding expectations for the output of
          formatters.
        
          - Per discussion; poll deferred until after later polls.
 
- Poll 2.2: P2093R6: formatters should not be sensitive to whether
          they are being used with a <format> or
          <print> facility.
        
          - Attendance: 8
- No objection to unanimous consent.
 
- Poll 3 discussion:
        
          - [ Editor's note: the original poll was "P2093R6: Regardless
              of format string encoding assumptions, <format>
              facilities (but not <print> facilities) may be
              used to format binary data." ]
- Victor stated that support for binary data is a nice capability
              to have and is needed to match existing uses of
              printf().
- Steve noted that this poll is relevant for cases where
              transcoding is required.
- Tom agreed and noted that the code author may not be aware of
              implementation performed transcoding.
- Jens asked for reasons that a text facility would be used for
              binary data.
- Victor responded that printf() is often used with
              binary data and noted that the format string does not
              necessarily contain text; it might solely contain field
              specifiers.
- Tom noted that filenames may be formatted, but might not conform
              to encoding expectations.
- Steve mentioned having also seen ostreams used with binary
              data.
- Hubert noted again that additional design work would be needed
              for binary data to be transported through any implicit
              transcoding performed by std::print().
- Hubert added that control characters can be another source of
              binary data.
- Zach suggested splitting the poll to address
              <format> and <print> separately so
              as to remove the parenthetical text.
- Zach suggested that there may be a use case for standard
              formatters for binary data or for a "raw" print interface.
- Victor suggested there may be some misunderstanding; that
              std::print() may be used with binary data with the
              result that garbage is displayed on the console.
- Hubert politely disagreed due to the lack of an escape mechanism
              for binary data.
- Jens agreed that some form of a non-text in-band signalling
              mechanism would be needed.
- Victor clarified that his argument for preserving binary data is
              for the case where output is directed to a file.
- Hubert noted that poll 3 and poll 10 are related and that
              concensus for poll 10 will require facilities related to poll
              3.
 
- Poll 3.1: P2093R6: Regardless of format string encoding
          assumptions, <format> facilities may be used to format
          binary data.
        
          - Attendance: 8 (1 abstention)
- 
            
          
- Consensus: Strong consensus in favor.
 
- Poll 3.2: P2093R6: Regardless of format string encoding
          assumptions, <print> facilities may be used to
          format binary data.
        
          - Attendance: 8 (1 abstention)
- 
            
          
- Consensus: Weak consensus in favor.
- A: No comment
 
- Poll 4 discussion:
        
          - [ Editor's note: the original poll was "P2093R6:
              <print> facilities exhibit undefined behavior
              when a format string or formatter output does not match encoding
              expectations." ]
- Steve expressed a desire for behavior less severe than undefined
              behavior.
- Victor expressed discomfort with undefined behavior as well,
              particularly that the poll applies to all std::print()
              invocations regardless of where the output is directed.
- Hubert spoke in favor of the poll and noted that this establishes
              that an implementor or code reviewer can diagnose these cases;
              that can't happen if behavior is defined.
- Jens agreed with Hubert, noted the existence of the precondition,
              and that a violation is "library UB" amd therefore less
              consequencial than core language UB.
- Steve stated in chat: "OK, based on Hubert and Jens's comments,
              I'll withdraw my objections about UB.  I'd like better
              terminology but this isn't the forum."
- Jens stated that the paper would benefit from some prose that
              explains the intended model and that inconsistently encoded data
              can be stitched together.
- Jens expressed distaste for preconditions being so specific to a
              corner case and professed desire for a good programming
              model.
- Zach noted similarities with
              P1868;
              the worst case outcome is mojibake displayed on the terminal;
              the damage is limited.
- Zach stated that either UB or implementation-defined behavior
              would be fine for now, but that we may desire another failure
              mode where the behavior is more contained in the future; a
              behavior mode that reflects that something went wrong, but where
              the damage is localized.
- Victor stated that he feels this poll overreaches; that the only
              concern is with regard to writing to a file vs a terminal and
              that, in practice, all that should happen is that the data is
              passed through or that replacement characters are
              substituted.
- Hubert noted that files may correspond to special devices;
              e.g., /dev/tty.
- Hubert stated that UB is a specification tool and noted that
              implementors are in a position to distinguish between polls 4
              and 5, but that a code reviewer generally cannot.
 
- Poll 4: P2093R6: <print> facilities exhibit
          undefined behavior when an encoding expectation is present and a
          format string or formatter output does not match those
          expectations.
        
          - Attendance: 8 (1 abstention)
- 
            
          
- Consensus: Strong consensus in favor.
- SA: I think this is too broad and the impact is larger than
              necessary.
 
- Poll 5: P2093R6: <print> facilities exhibit
          undefined behavior when an encoding expectation is present and a
          format string or formatter output does not match those expectations
          and output is directed to a device that has encoding
          expectations.
        
          - Attendance: 8 (1 abstention)
- 
            
          
- Consensus: Stronger consensus in favor relative to poll 4.
 
- Poll 6 discussion:
        
          - [ Editor's note: the original poll was "P2093R6:
              <print> facility implementors are encouraged to
              provide a run-time means for diagnosing format strings and
              formatter output that does not match encoding expectations."
              ]
- Tom noted that this is not dependent on UB.
- Hubert agreed.
- Corentin expressed skepticism that this is implementable.
- Hubert responded that the binary case is not well supported, but
              can be done and probably with a reasonable result.
- Hubert noted that it may be difficult for an implementation of
              this extension to distinguish the escaped binary data case.
- Charlie noted that invalidly encoded data can be detected,
              but that mojibake cannot be.
- Steve expressed desire for diagnostics for when the data doesn't
              match the encoding, but not for attempts to match mixed
              encodings.
- Zach noted that heuristic warnings can result in false positives
              and false negatives.
- Hubert observed that qualitative determination of good vs bad
              output may require a human.
 
- Poll 6: P2093R6: <print> facility implementors are
          encouraged to provide a run-time means for diagnosing format strings
          and formatter output that is not well-formed according to the
          expected encoding.
        
          - Attendance: 8 (1 abstention)
- 
            
          
- Consensus: Consensus in favor.
- A: I don't want double validation and this falls outside the
              standard.
 
 
- Tom stated that the next meeting will be in two weeks on June 23rd and
      that we will complete polling and discuss
      LWG 3565.
June 23rd, 2021
Draft agenda:
Attendees:
  - Charlie Barto
- Corentin Jabot
- Hubert Tong
- Jens Maurer
- Peter Brett
- Steve Downey
- Tom Honermann
- Victor Zverovich
- Zach Laine
Meeting summary:
  - P2093R6: Formatted output:
    
      - PBrett reviewed the polls taken at the last telecon.
        
          - [ Editor's note: See the
              June 9th, 2021
              summary for the prior polls. ]
- Tom clarified the intent behind the "encoding expectations"
              terminology in the polls; it is intended to distinguish cases
              where there is a dependence on a particular encoding, but
              without tying that dependence to a particular mechanism for
              determining the existence of such a dependence.  As proposed,
              the paper currently imposes a UTF-8 encoding expectation when
              the literal encoding is UTF-8.
- Hubert expressed being content with poll 5 relative to poll 4
              since the determination of what constitutes a device with
              encoding expectations is left up to the implementation.
- Hubert noted that it is ambiguous whether a file may constitute
              a device with encoding expectations and provided
              /dev/tty as an example.
 
- Poll 2.1 discussion:
        
          - Victor stated that std::format() does not have an
              encoding expectation by itself but that string formatters must be
              encoding aware to honor field width specifiers.
- Victor added that std::print() is special due to
              transcoding requirements.
- Hubert noted that these polls address the abstract design
              extent.
- Jens stated that, as currently specified, there is no implied
              encoding expectation, but there may be an expectation for the
              combined formatter outputs to be consistent.
- Jens added that the format string might not contribute text to
              the final result; it might consist solely of field
              specifiers.
- Jens concluded that concatenation of the output of two formatters
              that produce differently encoded text might produce text that is
              not consistently encoded and that nothing is provided to
              reconcile them.
- Tom agreed and opined that diagnostics would be useful, but that
              it is not clear how to reconcile that with desired support for
              binary formatting.
- Victor replied that he doesn't see any problems with combining
              binary and text and reiterated that the ability to do so
              addresses real use cases.
- PBrett opined that the <format> and
              <print> facilities do not need to be consistent;
              the only time an encoding expectation should be present is when
              the output is directed to a device with an encoding
              expectation.
- Jens asked if that implies that formatters must communicate the
              encoding of their output.
- Victor replied that use of formatters to combine binary and text
              data is not dissimilar to existing uses of
              std::ostream or printf(); it is up to the
              programmer to ensure that use of formatters matches the
              intent.
- Jens asked how a programmer determines what encoding is
              produced.
- Victor replied that it is determined by the literal encoding.
- PBrett replied that nothing in the standard states that though;
              not for std::format().
- Charlie stated that the Microsoft implementation assumes Unicode
              characters for the purposes of field width estimation, but that
              they could transcode to Unicode if the source encoding was known;
              but it is not known in general.
- Charlie noted that the arguments passed to formatters are not
              transcoded.
- Charlie added that format strings frequently consist of only
              invariant characters; effectively ASCII.
- Charlie cautioned that the encoding of format strings must be
              known to the implementation in order for format string parsing to
              not misinterpret trailing code units of multibyte encoded
              characters.
- Charlie noted that, for log files, it is not necessarily desirable
              to transcode to the system encoding.
- Corentin portrayed std::print() as a two step process of
              formatting followed by transcoding and stated that there is a
              precondition on the output device being able to display the text,
              but noted that such a precondition does not imply a postcondition
              on std::format().
- Corentin stated that diagnostics would be limited because
              mojibake is not always detectable.
- Hubert observed that the sentiment for the poll appears to be
              trending against it, but that we do have desire to avoid surprises
              with std::print(), or at least to say that we want some
              checking to be implemented.
- Hubert suggested that the model of std::print() as a two
              step process of calling std::format() and then printing
              the result may be too limiting and that a more integrated design
              that provides std::print() more detailed information
              about formatting outputs may unblock further progress.
 
- Poll 2.1: P2093R6: <format> and
          <print> facilities should have consistent behavior
          with respect to encoding expectations for the output of
          formatters.
        
          - Attendance: 9 (1 abstention)
- 
            
          
- Consensus: Strong consensus against.
 
- Poll 7 discussion:
        
          - Victor asked if encouragement would be stated as a note in the
              standard.
- Zach responded that LWG prefers normative encouragement of the
              form, "implementations should do X" and noted that such
              encouragement does not impose a requirement on implementors.
- Zach added that it is important to follow Unicode guidelines.
- Jens asked what the implication is to implementations that cannot
              implement the encouraged behavior.
- Zach replied that, as proposed, all implementations would be able
              to implement it since transcoding is only prescribed for one
              Unicode form to another.
- Victor noted that some implementations display a ? rather
              than a U+FFFD replacement character.
 
- Poll 7: P2093R6: <print> facility implementors are
          encouraged to substitute U+FFFD replacement characters following
          Unicode guidance when output is directed to a device and transcoding
          is necessary.
        
          - Attendance: 9 (1 abstention)
- 
            
          
- Consensus: Consensus in favor.
- SA: The terminal will already handle this.
- Tom noted that the device cannot handle this in the case where
              transcoding is necessary in order to direct the output to the
              device; e.g., when the device requires UTF-16.
- Jens noted that specifying that the behavior is undefined but
              then encouraging a particular behavior is novel.
- Zach agreed but noted that this is a case of "library UB", so kind
              of a special case.
 
- Poll 8 discussion:
        
          - [ Editor's note: the original poll was, "P2093R6: Neither
              <format> nor <print> facilities
              require an explicit program-controlled error handling mechanism
              for violations of encoding expectations." ]
- Zach stated that the poll should be framed as a change to the
              status quo.
 
- Poll 8: P2093R6: <print> facilities must provide
          an explicit program-controlled error handling mechanism for
          violations of encoding expectations.
        
          - Attendance: 9
- 
            
          
- Consensus: Strong consensus against.
 
- Poll 9 discussion:
        
          - [ Editor's note: The original poll was "P2093R6: Use of UTF-8
              as the literal encoding is sufficient for <format>
              and <print> facilities to assume that the format
              string and output of all formatters is UTF-8 encoded." ]
- Tom stated that the poll doesn't make sense as currently worded if
              formatters are allowed to format binary data.
- Zach stated that his position may differ for standard formatters
              vs user provided formatters.
- Zach added that the proposed heuristic already matches the
              behavior used to enable field width estimation.
- Tom disputed the claim that field width estimation depends on the
              choice of literal encoding.
- PBrett explained that field width is determined by code point
              values.
- [ Editor's note:
              [format.string.std]p11
              states:
              
              For a string in a Unicode encoding, implementations should
              estimate the width of a string as the sum of estimated widths of
              the first code points in its extended grapheme clusters.  The
              extended grapheme clusters of a string are defined by UAX #29.
              The estimated width of the following code points is 2
 ...
              The estimated width of other code points is 1.
 ]
- Charlie stated that Microsoft's implementation was designed
              around the literal encoding at least partially due to current
              technical limitations in the compiler.
- Victor stated that the literal encoding is not a perfect
              indicator, but is the best that we have available.
- PBrett agreed that we don't currently have anything better.
- PBrett noted that use of the literal encoding does affect the
              cases where uses of printf() can be simply changed to
              std::print() without potentially unintended behavioral
              changes.
- Zach compared use of the literal encoding to use of CMake; the
              least bad option.
 
- Poll 9: P2093R6: Use of UTF-8 as the literal encoding is
          sufficient for <print> facilities to establish
          encoding expectations.
        
          - Attendance: 9
- 
            
          
- Consensus: Very weak consensus.
- Corentin commented that LEWG sent these questions back to SG16
              for clarification and weak consensus isn't really good
              enough.
- PBrett suggested that perhaps use of an encoding tag could
              garner more consensus.
- Zach reiterated that the status quo is to use the literal
              encoding to enable width estimation.
- Jens replied that the standard does not connect literal encoding
              with width estimation.
- [ Editor's note:
              [format.string.std]p10
              states:
              
              For the purposes of width computation, a string is assumed to be
              in a locale-independent, implementation-defined encoding.
              Implementations should use a Unicode encoding on platforms
              capable of displaying Unicode text in a terminal.
               ]
- Zach responded that, regardless, implementations are relying on
              literal encoding.
- Charlie replied that his implementation should probably be
              performing width estimation for other encodings like GB18030.
 
- Poll 10 discussion:
        
          - [ Editor's note: the original poll was "P2093R6: Use of a
              literal encoding other than UTF-8 is sufficient for
              <format> and <print> facilities to
              assume a particular encoding for the format string and output of
              formatters." ]
- The weak results for poll 9 obviated the need to conduct this
              poll.
 
- Poll 11 discussion:
        
          - [ Editor's note: the original poll was "P2093R6: Support for
              implicit encoding conversions will only be possible when an
              encoding assumption is implicitly or explicitly present."
              ]
- Victor preempted the poll by volunteering to add prose regarding
              how future extensions could enable implicit transcoding
              features.
- Hubert noted that previous consensus was that
              std::format() and std::print() do not require
              the same encoding expectations.
- Hubert added that it isn't clear how an implementation might take
              that into consideration when the implementation intent appears to
              be to pass the output of a std::format() call to a
              transcoding facility.
- Corentin stated that LEWG time is more valuable than ours and,
              since we don't appear to have strong consensus, another meeting
              seems warranted.
- Victor agreed with Hubert and Corentin that more common
              understanding is required.
- Tom agreed and stated that it seems we are not yet ready to poll
              forwarding the paper.
- PBrett pondered how consensus could be improved.
- Zach suggested that those with positions on the margins could
              suggest ways in which their positions might be altered.
- Zach noted that the current proposal and discussion has been on
              particular technical details and that progress might be made by
              focusing on, for example, a "Unicode context" as opposed to the
              choice of literal encoding.
- Hubert requested a clear summary of how the implementation
              compares to the polls taken.
- Hubert added that he would not oppose moving forward with
              behavior based on the choice of literal encoding.
- Tom pondered whether Hubert's suggested escape mechanism for
              binary data would be helpful.
- Victor requested more details on that mechanism, or perhaps a
              pull request, and stated that he has not seen something that
              sounds similar implemented elsewhere.
 
 
- LWG 3565: Handling of encodings in localized formatting of chrono types is underspecified
    
      - Discussion postponed due to time constraints.
 
- P2295R4: Support for UTF-8 as a portable source file encoding
    
      - Discussion postponed due to time constraints.
 
- Tom stated that the next meeting will be in 3 weeks, on July 14th.
July 14th, 2021
Draft agenda:
Attendees:
  - Charlie Barto
- Corentin Jabot
- Hubert Tong
- Jens Maurer
- Mark Zeren
- Peter Brett
- Tom Honermann
- Victor Zverovich
- Zach Laine
Meeting summary:
  - P2295R5: Support for UTF-8 as a portable source file encoding
    
      - [ Editor's note: D2295R5 was the active paper under discussion
          at the telecon.  The agenda and links used here reference P2295R5
          since the links to the draft paper were ephemeral.  The published
          document is expected to differ from the reviewed draft revision as
          noted below. ]
- PBrett presented.
        
          - Peter's presentation slides are available
              here.
- The wording was revised based on feedback received from the SG16
              mailing list.
- Any wording changes approved today will appear in the revision
              of the paper that will be submitted for tomorrow's mailing
              deadline.
 
- Tom noted that the existing wording regarding the introduction of
          new-line characters for end-of-line indicators only applies to
          non-UTF-8 encoding schemes with the proposed changes.
- PBrett and Corentin explained that this is intentional; that
          end-of-line indicators are relevant for structured text
          (e.g., data sets), not for source files expressed as a sequence
          of code units.
- PBrett and Corentin noted that new-line character sequences will
          be revisited with
          P2348.
- [ Editor's note: A note was added to the final P2295R5 wording
          to explain that end-of-line indicators are not applicable to UTF-8
          encoded source files and that new-line characters separate lines.
          ]
- Hubert observed that some of the wording suggestions from the
          mailing list discussion had not been incorporated.
- [ Editor's note: Live editing of the proposed wording ensued,
          the discusion of which is not captured verbatim here.  Concerns
          discussed included use of "encoding scheme" vs "encoding", whether
          a plural form of "source file" should be used, methods to avoid
          use of the term "determined", and how to equate the sequence of
          UTF-8 code units with the elements of the translation character
          set. ]
- Mark asked if the proposed wording handles CR/LF new-line
          sequences.
- Hubert responded that
          P2348
          will address that concern.
- Poll: Forward D2295R5 with wording modifications as discussed to EWG for C++23.
        
          - Attendance: 9
- No objection to unanimous consent.
 
 
- P2362R0: Make obfuscating wide character literals ill-formed
    
      - PBrett presented.
        
          - Peter's presentation slides are available
              here.
 
- Tom noted that the execution wide-character set is not necessarily
          Unicode; non-encodable characters are possible even when
          wchar_t is 32-bit.
- Charlie noted that Visual C++ is technically not conformant since
          its 16-bit wchar_t is not able to store every possible
          locale dependent character in a unique wchar_t value.
- Hubert explained that ISO C++ does not permit use of a
          multi-code-unit encoding for wide character and string literals.
- Charlie asked what warning level Visual C++ requires for a warning to
          be issued for the cases proposed to become ill-formed.
- Corentin responded, W2.
- Tom asked Hubert how his implementation handles the multicharacter
          case.
- Hubert reported that xlC encodes the last character
          (like gcc and Clang).
- Wording review ensued.
- Tom requested that the use of "character literal" removed in the
          proposed wording for [lex.ccon]p2 be restored so that the note
          states, "... but does not determine the value of non-encodable
          character literals or multicharacter literals. ..."
- PBrett agreed to do so.
- Jens expressed a preference towards revising the paper title to
          remove the word "obfuscating" in order to avoid projecting
          bias.
- Tom responded that the title is the author's prerogative, but
          reported having had a similar reaction to the current title.
- Charlie asked if there is also motivation to make non-encodable
          character literals and multicharacter literals ill-formed as
          well.
- PBrett stated that there is and that writing a paper to do so is
          on his todo list, but that the motivation for ordinary literals
          is different because they are used and do not suffer some of the
          problems that the wide variety do.
- Poll: Forward P2362R0 with title and wording modifications as discussed to EWG for C++23.
        
          - Attendance: 9
- No objection to unanimous consent.
 
 
- LWG 3565: Handling of encodings in localized formatting of chrono types is underspecified
    
      - Deferred to the next telecon due to time constraints.
 
- Tom announced that the next telecon will be held 2021-07-28 and that the
      agenda will include
      LWG 3565
      and then
      P2348.
July 28th, 2021
Draft agenda:
Attendees:
  - Charlie Barto
- Corentin Jabot
- Hubert Tong
- Jens Maurer
- Mark Zeren
- Peter Brett
- Steve Downey
- Tom Honermann
- Victor Zverovich
Meeting summary:
  - LWG 3565: Handling of encodings in localized formatting of chrono types is underspecified
    
      - PBrett presented
        
          - The standard is underspecified in terms of what happens with
              localized chrono substitutions
- Proposed resolution is very narrow; limited to UTF-8
              scenarios
 
- Hubert: The direction makes sense, but the conversion to UTF-8 may
          not always be successful given the diversity of possible
          deployments.
- Hubert: There should be some form of error handling policy; which
          one
- Tom: The assumption is that there may not be characters that are in
          Unicode?
- Hubert: No, the implementation may not have a map from the source
          charset to Unicode.
- Charlie: Our implementation has MultiByteToWideChar, but it
          behaves in surprising ways for some encodings; some multibyte
          characters in some encodings may not convert correctly.
- Charlie: This doesn't permit requesting a non-UTF-8 encoding be
          used.
- Victor: If L is not specified, then the "C" locale
          is used and there is no issue.
- Victor: The proposed wording only applies when {:L} is
          used.
- PBrett: To clarify, there would be no way to preserve a non-UTF-8
          encoding through std::format().
- Victor: Correct.
- Charlie: The convention that the literal encoding affect
          std::format() behavior is currently limited; this widens
          that.
- Charlie: The other place literal encoding is used is parsing the
          format string; which makes perfect sense.
- Charlie: Widening this dependency on the literal encoding is
          concerning.
- Charlie: I expect some Windows users to write code with UTF-8
          literal encoding but to produce non-UTF-8 output.
- Charlie: This may occur when logging text, the format string may
          just consist of format specifiers.
- Victor: We also depend on the literal encoding for the "mu"
          character.
- Victor: Even if text looks like ASCII, it may not be; confusables
          may be present or line drawing characters.
- Steve: How does the library figure out what the literal encoding
          is?
- PBrett: Implementation magic; the compiler knows and can communicate
          it to the library.
- PBrett: Can we just specify that the locale text be transcoded to
          the literal encoding?
- Charlie: The UTF-8 only solution avoids the need for a large
          transcoding library.  The non-UTF-8 case may not support
          representation and therefore require/request transliterating.
- PBrett: In an implementation that supports CP1251 as locale,
          conversion to UTF-8 at least will be needed.
- PBrett: We should allow implementations the flexibility to provide
          the right result if they know how to.
- Charlie: This is mandating conversion in a specific circumstance;
          what happens when conversion is lossy?  We can't ensure
          convertibility to all code pages.
- PBrett: The proposed resolution forbids doing the right thing for
          GB18030, which is able to represent all the characters.
- Charlie: Right, the only encodings that support non-lossy conversion
          are Unicode ones.
- Charlie: It is reasonable to support EBCDIC here.
- Charlie: With regard to special characters like "mu", you can get
          mixed encodings regardless.
- Charlie: This differs from width estimation which is always best
          effort since GUI presentation is not usually known.
- Mark: This does pose a payload requirement on the implementation;
          not just implementation effort.
- Mark: The overload on locale could be limited to 1; each locale
          could be required to provide UTF-8 translations.
- Mark: The proposed resolution effectively requires a general purpose
          transcoding facility.
- Mark: This might be best left to implementation-defined.
- Hubert: There is a desire to allow conversion, but there is also a
          desire to avoid dependency on the output that locale facilities
          provide.
- Hubert: The pre-computation method could be intrusive for deployment;
          limiting localedef to character sets with mapping to Unicode
          available.
- Hubert: Perhaps guidance is to transcode when encoding information
          is known.
- Charlie stated in chat: "if you support both Russian.UTF-8
          and Russian.1251 then this is essentially saying that
          format will treat Russian.1251 as
          Russian.UTF-8 (assuming the actual content of the local
          facets is the same)"
- PBrett: This is what I was trying to suggest in email.
- PBrett: Only a burden on implementations if they support
          locale-specific encoding and if the locale specific encoding can be
          different from the literal encoding.
- PBrett: Implementations that already support many encodings are
          already burdened with the transcoding facilities.
- Victor: Agree with Peter; the "else" clause in the proposed wording
          should be relaxed; we should allow, but not require transcoding.
- Steve: For most POSIX system, locales are an open system and may be
          extended by users (in potentially broken ways).
- Steve: Implementations don't generally own the locale systems, so
          adding requirements there may not be implementable.
- Steve: But, yes, we should allow implementations to do the best they
          can; we shouldn't mandate brokenness.
- Charlie: Not a burden if transcoding is only needed for currently
          supported locales.
- Charlie: Would be a burden if an implementation had to convert
          between two non-Unicode encodings.
- Charlie: From an overhead perspective, probably not a big deal.
- Charlie: A note may suffice.
- PBrett stated in chat: "'L' = I want to be correct, not fast"
- Corentin: Agree with Peter; avoid specifying transcoding
- Corentin: options are to get output in locale specified, then convert
          to UTF-8, or to get UTF-8 directly.
- Corentin: Implementations can hack this for chrono types;
          there aren't that many strings involved.
- PBrett: Concerned about implementability since locales may be
          user-defined; implementations shouldn't have to engage in
          heroics.
- Hubert: Locale systems have allowances; users can compile their
          own.
- PBrett: Perhaps limit requirements to locales known by the
          implementation.
- Hubert: Wording to an implementation-defined set of locales may
          work here.
- Corentin: There is a limited amount of usefulness that can be
          extracted here; don't want to put too much effort here.
- Corentin: std::format() isn't a great tool for
          localization; real localization requires swapping the order of
          fields.
- Jens: Would like to ensure wording is more precise; need to
          specify which string literal encoding.
- PBrett: Summarizing:
        
          - 1. Limit the requirement to implementation provided locales.
            
              - Locales with an implementation-defined set of strings.
 
- 2. Permit implementation to "do the right thing"
- 3. Require "as if" transcoding when the literal encoding is
              UTF-8.
- 4. Permit "as if" transcoding when the ordinary literal encoding
              is not UTF-8.
 
- Hubert: That seems to reflect consensus, but falls under "as if"
          rules.
- Tom: Uncertain that we have consensus on dependency of UTF-8
          literal encoding.
- Victor: I thought we had consensus on that.
- Mark: Am mildly in favor of requiring this when the literal encoding
          is UTF-8.
- Hubert: That isn't implementable.
- PBrett: Right, only implementable for locales the implementation
          provides.
- Charlie: Implementations should be prohibited from transcoding to an
          encoding that is not Unicode (UCS-2 is not a Unicode encoding in
          this case).
- Charlie: We don't want transliteration here.
- Charlie: Should require UTF-8, permit UTF-7, UTF-EBCDIC, etc...,
          prohibit others.
- Hubert: Prior polls had consensus for UTF-8, but not for others.
          Consensus would likely be similar for other Unicode encodings.
- Tom: Concerned about that consensus.
- PBrett: Concerned about consistency here; trying to rationalize
          the UTF-8 focus.
- [ Editor's note: Some discussion of poll wording ensued ]
- Corentin: Charlie, why the prohibition to "as if" conversion to
          other encodings?
- Charlie: The goal is to avoid lossy conversions.
- Corentin: Can we just prohibit lossy conversions?
- Charlie: We could allow cases where the target encoding is not
          Unicode, but all of the characters are representable.
- Charlie: The concern is wanting to avoid transliteration.
- Corentin: I agree with that.
- Poll 1: Require implementations to make std::chrono
          substitutions with std::format as if transcoded to UTF-8
          when the literal ecoding E associated with the format
          string is UTF-8, for an implementation-defined set of locales.
        
          - Attendance: 9
- 
            
          
- Consensus: Consensus in favour.
- Poll bikeshedding; Tom wants to apply to wchar_t
              cases.
 
- Poll 2: Permit such substitutions when the encoding E is
          any Unicode encoding form.
        
          - Attendance: 9
- 
            
          
- Consensus: Consensus in favour.
 
- Poll 3: Prohibit such substitutions otherwise.
        
          - Attendance: 9
- 
            
          
- Consensus: No consensus.
- SA: This is an over constraint; should permit implementations
              to do best effort work.
- Hubert: This requires invention for the case where a locale is
              defined outside the implementation without a mapping to the
              target locale.
 
 
- P2348R0: Whitespaces Wording Revamp
    
  
- Tom: Next meeting in two weeks, will revisit
      LWG 3565
      if a paper is available;
      P2348R0
      otherwise.
August 25th, 2021
Draft agenda:
Attendees:
  - Charlie Barto
- Corentin Jabot
- Hubert Tong
- Mark Zeren
- Peter Brett
- Steve Downey
- Victor Zverovich
Meeting summary:
  - P2348R0: Whitespaces Wording Revamp
    
      - Corentin presented
- Steve: Is "basic source character set" a bug in comment grammar?
- Corentin: maybe
- Peter and Steve: Form feeds are used in sources
- Corentin: no change proposed
- Hubert: VT and FF don't end comments in clang or gcc.  Status quo is
          they may not be line breaks, although they may be whitespace
- Poll 1: Acknowledging that we have limited time available, we
          support the  direction for P2348R0 and encourage further work.
        
          - Attendance: 7
- No objections to unanimous consent
 
- Peter: Please bring back the paper rebased on
          P2314: Character sets and encodings,
          and add implementation notes.
 
- P2419R0: Clarify handling of encodings in localized formatting of chrono types
    
      - Charlie: Does this permit new things? If so it's appropriate to
          update feature test macro
- Peter: Would have liked to include recommended practice in the
          wording
- Charlie: Current wording is 'fine' because it has enough
          implementation defined wiggle room.
- Hubert: If we are to improve the wording, it might just need to be a
          note rather than normative
- Victor: Implementation coulde be in terms of codecvt facet,
          so it should work
- Charlie: Concern if there's a list of locales, it might be a problem
          if users customize facets of a locale derived from a system
          locale.
- Poll 2: Forward P2419 to LEWG as the recommended resolution of
          LWG 3565 and with a recommended ship vehicle of C++23.
        
          - Attendance: 7
- 
            
          
- Consensus: Strong consensus in favour.
 
 
- LWG 3576: Clarifying fill character in std::format
    
      - Charlie: MSVC processes codepoint, preserving the code unit sequence.
          libc++ stores a code unit. Error handling in MSVC deals with
          ill-formed sequences transcoding later.
- Hubert: Clarify as a note grapheme whether a cluster could include
          `{` or `}`
- Charlie: Implementation difficult, as finding `{}` is straightforward,
          parsing a grapheme cluster is hard.
- Peter: Doesn't like codepoint as it means combining characters are
          confusing in source.
- [ Editor's note: Contribution by Steve not recorded here ]
- Victor stated in chat: We already talk about grapheme clusters in
          width estimation
- Charlie: If we fill with a grapheme cluster, it's the first normative
          use of EGCs.  Some implementation difficulty. Varies over Unicode
          standard versions in some cases. Users have the ability to customize
          using formatters. Outside the normal range of use cases. A different
          format spec/library for multibyte fills? OK with etiher code unit or
          codepoint.
- Corentin: Agree with Charlie, maybe use emoji, but rendering of that
          is complicated.  Doesn't see a use case for combined characters
          either.
- Victor: Concerned about implementation experience with grapheme
          clusters as fill characters. Has had no requests for this
          functionality. Has had requests for codepoints. Code units would
          disallow box drawing characters.
- Peter: We allow EGCs now for width, why shouldn't we allow them as
          fill characters?
- Mark: We base on first character of cluster, specified as a
          heuristic. It's not a layout engine.
- Charlie: Width is 'should' not 'must' (not mandatory)
- Victor: We have to restrict the set of fill characters in any case.
          It might be theoretically better to use grapheme cluster, but has
          implementation concerns.  Way forward is to have a new facility for
          filling with grapheme clusters.
- Corentin: Question for Charlie and Victor: If we say codepoint now,
          can we change to grapheme cluster later?
- Charlie: Ict would probably break ABI. Heroic and disgusting hacks
          would be involved.
- Victor: It would be a break for libfmt.
- Hubert: Are we in agreement that there is an issue with the
          resolution as presented with it allowing `{}`? Do we need to discuss
          combining characters?
- Charlie: I don't think so. Not a common use case and not actually
          totally unreasonable. Could use a *universal-character-name*.
- Corentin: No value in protecting user from themselves in something
          they ask for.
- Peter: Will, "Play stupid games, win stupid prizes," make it into
          the minutes?
- Victor: Need to prevent characters disallowed by the grammar, but
          more than that is not necessary.
- Mark: Clarify poll for non-Unicode encoding?
- Charlie: MSVC doesn't treat UCS-2 properly, treats it as UTF-16. Do
          implementations have to deal with nonsense?
- Peter: This happens after all the other phases of translation
- [ Editor's note: There was some discussion of polling options. ]
- Poll 3.1: Recommend that the proposed resolution for LWG3576
          should be adopted, with the modification that the fill character
          must not contain '{' or '}' as part of the extended grapheme
          cluster.
        
          - Attendance: 7
- 
            
          
- Consensus against.
 
- Poll 3.2: The format fill character should be defined as
          "any codepoint of the literal encoding other than '{' or '}'".
        
          - Attendance: 7
- 
            
          
- Strong consensus in favour.
 
 
September 8th, 2021
Draft agenda:
Attendees:
  - Charlie Barto
- Corentin Jabot
- Hubert Tong
- Jens Maurer
- Mark Zeren
- Peter Brett
- Steve Downey
- Tom Honermann
- Victor Zverovich
Meeting summary:
  - Tom: Thank you to Peter and Steve for filling in during my absence.
- PBrett: Consensus from the polls taken during the last telecon held 
      2021-08-25 and as posted to the mailing list are no longer tentative;
      no new dissenting opinions were raised.
- D2348R1: Whitespaces Wording Revamp
    
      - Corentin: Introduction:
        
          - Reversed prior intention to classify vertical tab and form feed
              as new lines.
- Rebased on top of
              P2314R2: Character sets and encodings.
- Would like feedback about support for \n\r sequences;
              support can be provided under implementat-defined behavior.
- Jonathan Wakely would prefer not to use grammar terms in prose,
              but unsure how to do that; perhaps Jens can advise.
- Removed the restriction that non-space characters following a
              vertical tab and form feed in a single-line comment render the
              code ill-formed, no diagnostic required; addresses
              CWG2002: Whitespace within preprocessing directives.
 
- PBrett: The goal for now is that the wording reflect the design, it
          doesn't need to be perfect.
- Jens: In the new section [lex.whitespaces] there is a
          horizontal-whitespace that has infinite recursion.
- Corentin: The intent is to support a sequence of whitespace.
- Jens: There is a general rule that we use a separate production for
          sequences of characters.
- Tom: h-char-sequence is such an example.
- Jens: Yes, and q-char-sequence.
- Jens: The lexical specification for comment is problematic
          due to max munch; nothing prohibits */ appearing in the
          comment.  Something is needed to address the intent previously
          expressed in the removed prose.
- Jens: In the specification of d-char, line-break is not a
          single character; it may be a sequence and therefore doesn't work
          following "except".
- Jens: basic-s-char has the same issue.
- PBrett: Can we use a sequence of line-break characters?
- Jens: No; order matters.
- Jens: [lex.pptoken] hits a conflict between the requirement to
          capitalize the first word of a sentence and sentences that start with
          a grammar term; capitalizing the grammar term yields a different term,
          so the prose must be modified to avoid grammar terms at the beginning
          of a sentence.
- Jens: Perhaps we should introduce a formal definition of
          new-line to map to the grammar term.
- Jens: There is a general substitution of the line-break
          grammar term for new-line in the proposed wording.  Can we
          use new-line as the grammar term and not introduce a
          line-break production?
- Corentin: There is a desire to be able to discuss new-line
          abstractly, like in simple escape sequences.
- Jens: I'm wondering if we can avoid that in order to reduce the
          wording churn.
- Jens: P2314 intentionally did not touch new-line; it does
          update places where a single new-line character is designated; like
          for simple escape sequence.
- PBrett: Other than for churn; is there motivation to avoid replacing
          new-line with the grammar term?
- Jens: Yes, the changes remove a definition for new-line
          which we assume is needed by library, though I would be happy to be
          proven wrong.
- Corentin: Library use of new-line must refer to the single
          Unicode new-line character.
- Jens: If new-line always designates Unicode new-line, then
          we can keep new-line and use line-break for the
          grammar term.
- Steve: Time format spec supports a %n for new-line
          character.
- Jens: Could say it is equivalent to \n.
- Jens: There may be interaction with references to the C standard
          library.
- Corentin: C uses "new-line" as a grammar and library term.
- Poll 1: Prefer to use the term new-line rather than
          line-break in the whitespace grammar production.
        
          - Attendance: 10
- 
            
          
- No consensus for a change.
 
- Hubert: With respect to EWG impact; the changes remove a diagnosable
          issue involving vertical tab and form feed in preprocessor
          directives.
- Jens: That means we're removing a restriction and that is
          evolutionary; the changes to [cpp.pre] on page 12 of the paper
          removes the restriction.
- Corentin: There is no place in the grammar to have a new-line in a
          preprocessor directive.
- PBrett: Let's have Corentin to resolve this issue and come back with
          a revised paper.
 
- P2093R8: Formatted output
    
      - Victor presented slides:
        
      
- PBrett: Use of P2419 as a wedge is questionable here since its
          changes granted permission rather than mandating behavior.
- Victor: We went with more relaxed wording due to concerns over user
          provided locales; we could strengthen the behavior.
- Hubert: Yes, we had weak consensus for use of literal encoding for
          UTF-8, but that doesn't imply consensus for more general use.
- Tom: I don't buy the argument that because the format string needs
          to match literal encoding for compile time processing that that
          implies the formatted result must be in the same encoding; though
          production in a different encoding would impose overhead.
- Tom: Use of the literal encoding as required for compile-time
          parsing of the format string limits this being a precedent for
          similar use of the literal encoding elsewhere.
- PBrett: We discussed GB18030 recently and wide strings. Victor,
          are you wedded to this being UTF-8 specific?
- Victor: No.  UTF-8 is problematic in practice.  Different problems
          occur for other encodings.  Worried about increasing scope
          though.
- Poll 2: Use of UTF-8 as the literal encoding is sufficient for
          <print> facilities to establish encoding expectations.
        
          - Attendance: 9
- 
            
          
- Consensus in favor.
- A: Against rationale: Still concerned that people are not
              going to use the faciility correctly, i.e. end up with mojibake
              anyway in corner cases that they won't find until later.  Would
              prefer solution that provides a stronger way to associate an
              encoding with the output, but there isn't an extant proposal to
              do that.
 
- Charlie: I abstained for similar reasons.
- Hubert: We did not read through the minor wording changes in
          paragraph 31 and it would be good to do so quickly.
- Hubert: Looks pretty good; are we clear that the UB only applies
          after the first if?
- Hubert: The order of the if statements is not correct; there are
          subordination issues.
- PBrett: In "If this requires transcoding", it is unclear what "this"
          refers to.
- Jens: Strike "then" in favor of a comma in
          "If this requires transcoding then ..."
- Jens: Remove the trademark symbol.
- Poll 3: Correct the P2093R8 wording for [print.syn].31 to remove
          ambiguities, and forward P2093 as revised to LEWG with a recommended
          ship vehicle of C++23.
        
          - Attendance: 9
- 
            
          
- Consensus in favor.
 
 
- P2361R2: Unevaluated string literals
    
      - Ran out of time; will discuss next time.
 
- Next telecon on 9/22 will review D2348R1 subject to a new revision,
      P1636 Formatters for library types, and
      P2361 Unevaluated strings.
September 22nd, 2021
Draft agenda:
Attendees:
  - Aaron Ballman
- Charlie Barto
- Corentin Jabot
- Hubert Tong
- Jens Maurer
- Marina Oliveira
- Mark Zeren
- Peter Bindels
- Peter Brett
- Steve Downey
- Tom Honermann
- Tomasz Kamiński
- Victor Zverovich
Meeting summary:
  - D2348R2: Whitespaces Wording Revamp
    
      - [ Editor's note: D2348R2 was the active paper under discussion
          at the telecon.  The agenda and links used here reference P2348R2
          since the links to the draft paper were ephemeral.  The published
          document may differ from the reviewed draft revision. ]
- Corentin stated that there are no design change between the R1 and
          R2 revisions.
- Tom asked for confirmation that the only known behavioral change is
          that the VT and FF characters would be well-formed in comments
          rather than ill-formed no diagnostic required.
- Hubert responded that the proposal also expands the set of allowed
          horizontal space characters in preprocessing directives.
- Aaron asked if there is desire to recommend the proposal as a DR.
- PBrett responded that there is no need to do so since the changes are
          effectively specification improvement.
- Tom asked Hubert if all of the concerns he had raised on the mailing
          list have been addressed to his satisfaction?
- Hubert responded that they have been.
- Poll 1: Forward D2348R2 to EWG as the recommended resolution of
          CWG2002 and CWG1655 and with a recommended ship vehicle of C++23.
        
          - Attendance: 12
- 
            
          
- Strong consensus in favor.
 
 
- P1636R2: Formatters for library types
    
      - PBrett stated that SG16 is reviewing this paper due to concerns Tomasz
          raised regarding quoting and localization in the formatting of
          std::filesystem::path.
- Victor stated that we currently lack the tools to adequately address
          these concerns now.
- Victor recommended removing support for std::filesystem::path
          from the paper for now.
- Victor noted that planned range related enhancements will enable the
          desired quoting support.
- PBrett observed that, if explicit support for
          std::filesystem::path is removed, then objects of that type
          will end up getting formatted as a comma separated list since it
          models a range.
- Victor reported plans in place elsewhere to reject use of
          std::filesystem::path as a range.
- PBrett noted that information can be lost when formatting a path as
          text.
- Victor replied that transcoding is possible and that a quoted escape
          mechanism could be used for portions of a path that would not round
          trip through a transcoder losslessly.
- Victor noted that use of the classic locale is a red herring as it
          has no effect on the output.
- Tomasz noted the existence of two papers that overlap on these
          design questions.
- Corentin expressed agreement with Victor that support should wait
          until there is an escaping mechanism available to losslessly preserve
          path contentss in formatted text.
- Charlie noted that there may be cases where replacement characters
          might be preferred over of of an escaping mechanism that might
          interfere with further processing of the output.
- Charlie cautioned against including <format> in lots
          of standard library headers since doing so could result in ABI
          problems if formatter templates are separately compiled.
- Victor opined that std::format is effectively a generalized
          to_string() and that every type should be formattable.
- PBindels noted that platform specific knowledge may be required to
          format paths.
- Charlie remarked that confusion between the literal encoding and the
          system code page remain possible.
- Charlie noted that Java has the benefit of only needing to compile
          the code that implements its string type once, but that C++ must do
          so for every TU that uses it.
- Charlie added that, for Microsoft's implementation, the
          <thread> header includes <format> for
          chrono support.
- Tomasz remarked that it is strange that including
          <thread> results in portions of <format>
          being included, but noted that the standard doesn't require that
          direct inclusion and that implementations should avoid it.
- Charlie responded that <thread> including
          <format> is a quality of implementation issue, but
          noted that, for formatters, an extern template would be required.
          However, for std::format, the first argument is the format
          context and it probably can't be declared as an extern template.
- PBindels asked why a platform wouldn't know what encoding is used by
          the filesystem.
- Charlie responded that file names don't necessarily have an explicitly
          associated encoding.
- Tom added that a path may have multiple associated encodings if it
          spans filesystems.
- Charlie further added that additional problems occur with network
          filesystems that substitute characters for reserved character like
          `:` on Windows.
- PBrett stated that, if the literal encoding is UTF-8, then the
          associated encoding of std::string is nominally UTF-8 and
          that the string() and u8string() members of
          std::filesystem::path should return the same content.
- Victor responded that, on Windows, the string() member of
          std::filesystem::path returns a string encoded according
          to the system code page.
- PBrett asked if a similar concern exists for wchar_t.
- Steve responded affirmatively; Windows paths are a sequence of 16-bit
          code units, not UTF-16.
- PBrett suggested a solution like the one adopted for locale dependent
          chrono fields; if the literal encoding is a UTF, then implementations
          can convert as best they know how.
- Victor responded that the same resolution can be used and is simpler
          because std::filesystem::path already offers the necessary
          encoding conversion functionality.
- PBrett presented a poll option that specifed conversion in terms of
          [fs.path.fmt.cvt].
- Charlie strongly agreed that formatting as if by the
          u8string() member of std::filesystem::path is the
          right thing to do.
- Victor expressed a preference for a solution that preserves all
          information.
- Tom proposed considering solutions from a text vs binary perspective
          with a goal to preserve binary representation so as to avoid data
          loss; programmers can perform conversion to text with their own
          preferred substitution when desired.
- Victor agreed and noted a desire for a solution that maintains round
          tripping.
- Tomasz suggested the possibility of multiple formatting options.
- Charlie noted that use of an escape mechanism would solve the problem
          of conversions between libraries that work in narrow vs wide
          characters.
- PBrett opined that it sounds like we need an actual proposal for how
          to format paths.
- PBrett repeated the earlier advice to remove support for
          std::filesystem::path from the paper and encouraged the
          creation of a new proposal to support it before
          P2286
          is adopted.
- Tomasz stated there is no urgency so long as
          P2286
          precludes handling std::filesystem::path as a range.
- Poll 1: Recommend removing the filesystem::path formatter from
          P1636 "Formatters for library types", and specifically disabling
          filesystem::path formatting in P2286 "Formatting ranges", pending
          a proposal with specific design for how to format paths properly.
        
          - Attendance: 12
- 
            
          
- Strong consensus in favor.
 
- PBrett asked for a volunteer to write the suggested paper.
- Victor volunteered.
- PBrett volunteered to help with wording.
- Mark asked rhetorically if solving the escaping problem also
          solves the unescaping problem.
 
- P2361R2: Unevaluated strings
    
      - Corentin presented:
        
          - Corentin's presentation slides are available
              here.
- Previously, all string literals were converted to the literal
              encoding in translation phase 5 whether they corresponded to
              lexical strings or string literal objects.
- The goal is to prohibit numeric escape sequences and conditional
              escape sequences in lexical strings, but not in string literals
              that initialize string literal objects. 
- Support for UCNs and other character escapes is retained for all
              string literals.
- There is currently implementation divergence regarding when
              encoding prefixes are or are not allowed.
 
- Jens noted that the list of unevaluated string literals is missing
          the literal operator ID case.
- Jens stated that, following
          P2314,
          conversion and addition of a null character is now performed during
          translation phase 7.
- Hubert noted that other proposals are changing nearby wording and
          that a rebase will likely be needed.
- Hubert observed that wording is missing with regard to how to compare
          strings in cases for extern "C".
- Corentin replied that he will update the wording.
- Hubert noted that the wording will need to address cases like
          extern "\u0043".
- Corentin acknowledged that the proposed wording will need some
          updates.
- Corentin added that SG22 will review the paper soon and that he
          would like to target C++23.
- Jens identified a grammar ambiguity; unevaluated-string and
          string-literal both match s-char-sequence.
- Hubert noted that a similar case occurs with
          header-name.
- Jens replied that the header-name case can be disambiguated
          by a preceding #include but that the preprocessor cannot
          disambiguate unevaluated-string and string-literal
          in, e.g., static_assert().
- Corentin replied that he'll find a way to address this without
          modifying the grammar.
- Jens suggested retaining string-literal as the lexical term
          and then handling the different cases where the uses diverge.
- Hubert stated that there are non-diagnostic concerns; for example
          with asm statements.
- Corentin replied that an implementation can do whatever it likes with
          asm strings, such as passing them to an external assembler;
          the standard doesn't have to address such cases.
- Hubert responded that the proposed change does reduce what the
          programmer can express, but that an implementation could, for example,
          do something different with an encoding prefix, issue a warning, and
          continue.
- Hubert noted that following the introduction of char8_t,
          u8"" string literals may no be accepted in some contexts
          they previously were.
- Jens remarked that, for string literals, there is a distinct place
          where encoding conversion is specified; when initializing a string
          object.  For unevaluated string literals, there is no single
          location.
- Corentin replied that he would work with Aaron to identify a wording
          solution.
- PBindels asked if the proposal should be recommended as a DR.
- Corentin stated no opinion on the matter.
- Aaron replied that consideration as a DR is questionable.
- PBindels clarified that doing so could make the life of an implementor
          easier by avoiding any need to fix conformance issues with rejection
          of encoding prefixes in earlier standard conformance modes.
- Poll 3: Acknowledging that we have limited time available, we
          support the direction for P2361R2 and encourage further work.
        
          - Attendance: 12
- 
            
          
- Strong consensus in favor.
 
 
- Tom announced that the next meeting will be on October 13th.
- [ Editor's note: The next meeting ended up getting moved to
      October 6th due to scheduling conflicts. ]
October 6th, 2021
Draft agenda:
Attendees:
  - Charlie Barto
- Corentin Jabot
- Hubert Tong
- Jens Maurer
- Mark Zeren
- Peter Brett
- Steve Downey
- Tom Honermann
- Victor Zverovich
- Zach Laine
Meeting summary:
  - D2460R0: Relax requirements on wchar_t to match existing practices
    
      - [ Editor's note: D2460R0 was the active paper under discussion
          at the telecon.  The agenda and links used here reference P2460R0
          since the links to the draft paper were ephemeral.  The published
          document may differ from the reviewed draft revision. ]
- Corentin presented:
        
          - Writing this paper was necessary to make progress on P1885.
- The standard has been out of sync with at least one major
              implementation for many years.
- The proposed wording transitions prior core language
              requirements to library preconditions.
 
- PBrett commented that maintaining preconditions in the library wording
          seems correct, but that the wording should be changed to introduce
          library UB for characters that are not encodeable in a single code
          unit.
- Corentin replied with a desire to agree on the design first and then
          address wording.
- Hubert objected to the original paper title
          ("UTF-16 is standard practice")
          since UCS-2 is also non-conforming when used as the execution
          wide-character set if the execution character set contains more
          characters as happens when UTF-8 is the execution encoding.
- Hubert agreed with the direction that PBrett suggested.
- PBrett summarized; the direction is good, some refinement is needed,
          and some prose is needed to explain why claiming UCS-2 instead of
          UTF-16 does not suffice to avoid issues.
- Jens and Hubert clarified that the prose should make it clear that
          the changes also allow use of UCS-2 when, e.g., UTF-8 is used as the
          execution encoding.
- PBrett asserted that the prose should explain how the wording change
          accomplishes the goals of the paper.
- PBrett asked if there is an existing core issue for concerns
          addressed by the paper.
- Corentin replied that he was unable to find one.
- Mark verified that there are no active CWG issues that mention
          UCS-2 or UTF-16.
- Poll 1: Add expanded motivation to D2460R0 and forward the paper
          so revised to EWG with a recommended ship vehicle of C++23.
        
          - Attendance: 10
- 
            
          
- Strong consensus in favor.
 
- Hubert asked if a feature test macro is warranted and noted the
          existence of __STDC_MB_MIGHT_NEQ_WC__.
- PBrett suggested that SG10 (the feature test study group) review
          the need for a macro.
- Tom noted that LEWG should review the paper since it adds library
          UB where none was possible previously.
- Tom asked if anyone felt the need to review a revision of this
          paper in SG16 again.
- No such desires were raised.
- Corentin indicated that he will start a mailing list discussion
          for LEWG.
 
- D1885R8: Naming Text Encodings to Demystify Them
    
      - [ Editor's note: D1885R8 was the active paper under discussion at
          the telecon.  The agenda and links used here reference P1885R8 since
          the links to the draft paper were ephemeral.  The published document
          may differ from the reviewed draft revision. ]
- Corentin presented:
        
          - Corentin's presentation slides are available
              here.
- The paper goals are limited to tagging known encodings used for
              interchange, not every possible encoding.
- There is considerable history, some of it contradictory, mistakes
              have been made.
- There are multiple encoding kinds; fixed width vs variable width,
              single byte vs double byte.
- Wide interfaces are provided mostly for consistency with
              char-based interfaces.
- There are few wide character encodings.
 
- Hubert disputed the statement that there are few wide character
          encodings and indicated there are at least as many wide encoding
          variants as there are ISO-8859 variants.
- Corentin expressed a desire for more information.
- Hubert replied that, for every IBM documented CCSID encoding, there
          is one two byte and one four byte encoding; the narrow encoding is
          the odd one that uses a shift-state encoding.
- Hubert noted that documentation is written in terms of character sets
          that are trivially encoded; encoding schemes are therefore not
          explicitly documented.
- Tom recommended IBM's "Character Data Representation Architecture"
          documentation.
- [ Editor's note: Hubert later posted links to related IBM
          documentation to the SG16 mailing list in an email thread sith
          subject, "Structure of EBCDIC MBCS and wide EBCDIC"; an archive of
          that message thread is available at
          https://lists.isocpp.org/sg16/2021/10/2719.php.
          ]
- Hubert noted that he usually consults ICU's converter explorer rather
          than IBM documentation.
- [ Editor's note: ICU's converter explorer is available at
          https://icu4c-demos.unicode.org/icu-bin/convexp.
          ]
- Hubert noted that, for iconv(), use of the UTF-16 encoding
          results in BOMs being produced and consumed.
- Jens presented:
        
          - Jens' presentation slides are available
              here.
- An octet is not the same as a byte.
- The cncoding form concept is applicable to non-Unicode
              encodings.
- An encoding scheme encodes the output of an encoding form into a
              series of octets.
- The "UTF-16" identifier is ambiguous because it may refer to
              either the encoding form or the encoding scheme.
- The IANA registry specifies encoding schemes.
 
- Tom asked if the use case presented for iconv() has defined
          behavior since it involves writing to objects of type wchar_t
          using pointers to [unsigned] char.
- PBrett responded that objects of type wchar_t can be
          allocated and then passed to iconv() to read or write
          them.
- Corentin asserted that the encoding form concept is not useful for
          users.
- Tom stated that he remains unclear with regard to behavior for,
          e.g., UTF-16 in char when CHAR_BIT is 16.
- Hubert replied that we take the hand wavy approach and avoid
          BOMs.
- Zach stated that, as long as the encoding matches the bits produced,
          that he is satisfied; there needs to be a 1x1 corespondence between
          bytes.
- Jens asserted that UTF-16LE or UTF-16BE should be returned.
- PBrett replied that programmers won't expect that.
- Tom suggested that we decide the behavior we want, and then make the
          wording match that.
- Jens noted the desire to return UTF-16, but that the definitions in
          our normative references don't permit that.
- Poll 2: The values returned by the literal() and
          `wide_literal() functions must indicate the encoding scheme
          associated with the object representation of ordinary and wide string
          literals respectively; UTF-16 & UTF-32 are interpreted as having
          native endianness, and the LE and BE forms are never returned.
        
          - Attendance: 10
- 
            
          
- Strong consensus in favor.
 
- Poll 3: Notwithstanding the specification in ISO10646, we suggest
          to return UTF-{16,32} from literal() or
          wide_literal() with the understanding that string literals
          in the compiled program may not actually begin with a BOM and that
          library facilities [e.g. iconv()] may consume a BOM if
          present.
        
          - Attendance: 10
- 
            
          
- Strong consensus in favor.
 
- Poll 4: Forward P1885 as revised to incorporate SG-16 feedback on
          object representation interpretation to LEWG with a recommended ship
          vehicle of C++23.
        
          - Attendance: 8
- No objection to unanimous consent.
 
 
- Tom stated that the next telecon will be October 20th.
October 20th, 2021
Draft agenda:
  - D2071R1: Named universal character escapes
    
      - Add named escape sequences to universal-character-name so
          that these escape sequences can be used everywhere, not just in
          string literals.
- Use Unicode rules for matching names rather than requiring exact
          case-sensitive names.
 
- P1885R8: Naming Text Encodings to Demystify Them
    
      - Continue discussions of issues raised on the LEWG and SG16 mailing lists.
- Prohibit mapping to IANA encodings when CHAR_BIT is not 8?
- Address special cases for IANA mapping purposes:
        
          - Is UTF-16 valid for ordinary strings when CHAR_BIT
              is >= 16?
- Is UTF-16 valid for wide strings when CHAR_BIT
              is >= 16 and sizeof(wchar_t) is 1?
- Is the underlying representation of a wide string required to
              match an encoding scheme for the encoding form when
              sizeof(wchar_t) is not 1?
- Limit mapping of wide strings when sizeof(wchar_t)
              is not 1 to other, unknown, and the UCS/UTF
              variants?
 
 
Attendees:
  - Charlie Barto
- Hubert Tong
- Jens Maurer
- Mark Zeren
- Peter Brett
- Steve Downey
- Tom Honermann
- Victor Zverovich
- Zach Laine
Meeting summary:
  - D2071R1: Named universal character escapes
    
  
- P1885R8: Naming Text Encodings to Demystify Them
    
      - PBrett introduced the topics for discussion:
        
          - Whether the encoding querying functions should return
              unknown when CHAR_BIT is not 8.
- How to handle wide strings for various values of
              sizeof(wchar_t) and CHAR_BIT.
 
- Hubert suggested that decisions regarding how to handle
          CHAR_BIT when it is not 8 may have to be deferred to SG14
          for embedded implementations.
- Zach stated that sizeof(wchar_t)==1 is problematic when
          CHAR_BIT is 8.
- PBrett replied that there is a proposal to lift the restriction that
          currently requires that wchar_t be able to represent all
          characters of all implementation supported character sets;
          P2460 (Relax requirements on wchar_t to match existing practices).
- Jens noted that we have discussed encoding schemes in the context of
          wide_literal() and that BE/LE appropriate results would be
          expected in that case, but we currently have consensus for a native
          endian result with no BOM semantics.
- Jens raised a consistency concern; the paper currently erases the
          encoding endianness information for the UTF cases, but not for the
          UCS cases.
- Jens stated that there are questions about wide-EBCDIC and endianness,
          but that those encodings don't currently exist in the IANA
          registry.
- Jens noted that, at present, the only permissible IANA registered wide
          encodings when sizeof(wchar_t) is not 1 are UTF-16, UTF-32,
          UCS-2, and UCS-4.
- PBrett asked Charlie for his impression of what the impact would be of
          returning UTF-16BE on Windows assuming a bigendian platform.
- Charlie responded that Windows doesn't support any bigendian
          platforms, so it wouldn't matter right now; Windows programmers just
          assume UTF-16LE.
- PBrett expressed concern about unexpected encoding names being
          returned and compared using other APIs.
- Hubert observed that programmers may, or may not, want to see UTF-32LE
          vs UTF-32BE be returned for one Linux system vs another.
- Steve raised the concern of a program externalizing an encoding name
          as UTF-16 and then providing UTF-16LE text instead of (the expected
          default of) UTF-16BE.
- Steve mentioned in chat: "UTF-16 generally is supposed to imply BE.
          In practice it doesn't but, that's an inconsistency."
- Charlie asked in chat: "isn't that just because the network byte
          order is BE?"
- Jens replied in chat: "Steve: No. ISO 10646 encoding scheme "UTF-16"
          says "interpret BOM; if none is found, use big-endian"."
- Jens continued in chat: "Steve: iconv does "interpret BOM; if none is
          found, use host endianness"."
- Tom observed that, in the standard, the wording for string literals is
          written in terms of code units and encoding form and expressed a
          belief that programmers tend to work on code units rather than bytes;
          except for interfaces like iconv().
- Jens replied that previous polls supported an encoding scheme approach
          in order to support the iconv() use case.
- Jens stated that switching to encoding form would be a no-op for
          ordinary strings.
- Jens added that concern about object representation seems wrong since
          it is so implementation specific.
- PBrett expressed a desire to work with bytes and that object
          representation therefore matters for wide strings.
- Hubert acknowledged the present inconsistency and noted the friction
          with encoding scheme.
- Charlie stated that it is difficult to conceive of cases where the
          object representation encoding would differ from the native
          encoding.
- Jens noted that proper byte access would currently require querying
          native endianness when presented with UTF-16; if the special case for
          UTF-16 were to be dropped, then behavior would be consistent.
- Tom noted the benefit of being able to use UTF-16BE on little endian
          systems for encoding tagging purposes.
- Jens observed that friction could be reduced by dropping support for
          wide strings.
- Tom stated that we should re-poll the special case for UTF-16.
 
- Tom stated that the next telecon will be November 3rd and that we will
      plan to poll the special case for UTF-16 for P1885, and possibly look at
      updated wording for P2071.
- [ Editor's note: since LEWG will be preceding with electronic polling
      of P1885R9 as is, SG16 will table further discussion of that proposal
      pending a new paper that argues for changes. ]
November 3rd, 2021
Draft agenda:
Attendees:
  - Hubert Tong
- Jens Maurer
- Peter Brett
- Steve Downey
- Tom Honermann
- Victor Zverovich
- Zach Laine
Meeting summary:
November 17th, 2021
Draft agenda:
Attendees:
  - Aaron Ballman
- Charlie Barto
- Corentin Jabot
- Jens Maurer
- Peter Brett
- Steve Downey
- Tom Honermann
- Victor Zverovich
- Zach Laine
Meeting summary:
  - [ Editor's note: The agenda order was revised to accommodate
      scheduling conflicts. ]
- P2361R3: Unevaluated strings
    
      - Corentin introduced the recent wording changes and noted that the
          unevaluated-string production is not matched until after
          lexing, but is referenced from the wording for the preprocessor
          line control directive and the _Pragma operator as a means
          to impose constraints on their string-literal elements.
- Corentin added that, for asm declarations, the only change
          now is to prohibit an encoding prefix.
- PBrett requested confirmation that this represents a design
          change.
- Corentin confirmed that it does.
- PBrett asked what the ramification would be if EWG rejected such a
          change.
- Corentin responded that there is no current implementation experience
          involving asm declarations that use an encoding prefix.
- Corentin added that numeric escape sequences are still allowed in
          asm declarations but that their effect is unknown.
- Aaron noted another change from the prior revision that was inspired
          by implementation experience; the paper now addresses user-defined
          literals (UDLs).
- Jens observed that the change to the grammar for the preprocessing
          line control directive introduces an allowance for use of raw string
          literals.
- Aaron stated this appears to be an oversight.
- Corentin agreed.
- Jens stated that use of string-literal should be avoided for
          the preprocessing line control directive if the grammar term doesn't
          apply.
- Aaron noted that this is a pre-existing issue and asked how it should
          be repaired.
- Jens asked how the C standard handles this.
- Aaron replied that the C standard defines string-literal with
          an optional encoding prefix.
- Corentin stated that the intent was not to enable new syntax, but
          asked if an allowance for raw strings would be problematic.
- Jens responded that raw strings can contain new lines, but
          preprocessing directives are line based.
- PBrett noted that such an allowance would introduce a new divergence
          from C.
- PBrett observed that the current wording discusses
          string-literal.
- Jens agreed that there is an existing issue in that the line control
          wording discusses string-literal where no such production is
          used.
- Jens suggested retaining the current grammar so as to avoid an
          unintended change in meaning.
- Corentin agreed to revert the use of string-literal in the
          proposed line control wording and to note the existing issue.
- Jens requested that be included as an editorial note in the wording
          to ensure CWG considers it during wording review.
- Jens requested that the proposed wording be rebased on the current
          draft so as to avoid the need for updates to [lex.phases] and
          [lex.string].
- Jens requested that "encoding prefix" be styled as a grammar term in
          [dcl.asm].
- Jens observed that the user-defined literal operator wording also
          allows use of raw string literals.
- Jens noted that, in [dcl.link], the comparison of the recognized
          language linkages includes the quotes thereby requiring that a
          declaration be written as extern "\"C\"".
- Corentin reported that Hubert also had a concern that it was not
          stated how to compare the literal contents in the wording.
- Jens noted that universal-character-names (UCNs) can appear
          in an unevaluated-string, but that it isn't clear with
          respect to the comparison in [dcl.link] when that replacement occurs;
          "\u0043" and "C" should be handled
          equivalently.
- Jens stated that it is unclear why the wording for [cpp.pragma.op]
          has been updated to strike handling of escape sequences.
- Jens admitted a need to translate UCNs for string literals, but noted
          that doesn't happen here.
- PBrett observed that doing so could change the meaning of existing
          code.
- Jens agreed and noted that restoring handling of escape sequences
          will achieve the desired result; the preprocessing of the
          destringized string will expand UCNs.
 
- P1854R2: Conversion to literal encoding should not lead to loss of meaning
    
      - [ Editor's note: D1854R2 was the active paper under discussion at
          the telecon.  The agenda and links used here reference P1854R2 since
          the links to the draft paper were ephemeral.  The published document
          may differ from the reviewed draft revision. ]
- Corentin provided an introduction.
- PBrett requested that the abstract be updated to summarize the problem
          the paper addresses, how it is solved, and what the impact is.
- PBrett suggested that the proposed wording for [lex.ccon] consistently
          state, "in the literal's associated character encoding".
- Corentin responded that there is no need to do so since multicharacter
          literals are no longer subject to use of an encoding prefix; their
          associated encoding is always the narrow literal encoding.
- Jens agreed that indirection through an association is not required,
          but observed that the correct encoding is the
          "ordinary literal encoding", not the "narrow literal encoding".
- Jens requested that "encoding prefix" be styled as a grammar
          term.
- Discussion ensued regarding the goals of the paper and concluded with
          the following clarifications:
          - The proposal does not intend to prohibit a c-char
              from contributing more than one code unit to the calculation of a
              multicharacter literal value.
- The proposal does intend to prevent a character literal
              from being unintentionally parsed as a multicharacter
              literal in visually ambiguous situations.
 
 
- [ Editor's note: Consider 'é' in a UTF-8 encoded source
          file. If the source file is in Normalization Form C
          (NFC; `é` is U+00E9 {LATIN SMALL LETTER E WITH ACUTE}), then the
          expression would be an ordinary character literal. However, if the
          source file is in Normalization Form D
          (NFD; `é` is U+0065 {LATIN SMALL LETTER E} followed by
          U+0301 {COMBINING ACUTE ACCENT}), then the expression would be a
          multicharacter literal. The proposal seeks to avoid such visual
          ambiguity by restricting the individual written characters in
          multicharacter literals to those that only contribute a single code
          unit in the ordinary literal encoding. This suffices to reject the
          code in the NFD case (U+0301 isn't encodeable as a single code unit
          in any encodings that are used as the ordinary literal encoding in
          practice. ]
- Corentin agreed to remove the restriction on UCNs from the wording
          added to the first paragraph of [lex.ccon] since use of a UCN does
          not produce visual ambiguity.
- [ Editor's note: Thus, the NFD case above can be explicitly
          written as 'e\u0301'. ]
Tom announced that the next telecon will be held on 2021-12-01 and that
      the agenda will include
      LWG3639 (Handling of fill character width is underspecified in std::format)
      and further review of P2361 and P1854 pending the availability of new
      revisions.December 1st, 2021
Draft agenda:
Attendees:
  - Barry Revzin
- Charlie Barto
- Corentin Jabot
- Hubert Tong
- Jens Maurer
- Mark Zeren
- Peter Bindels
- Peter Brett
- Steve Downey
- Tom Honermann
- Victor Zverovich
- Zach Laine
Meeting summary:
  - [ Editor's note: The agenda order was revised to accommodate
      attendee schedules. ]
- P2286R3: Formatting Ranges
    
      - Barry provided an introduction.
        
          - The goal is to add formatting support for types like tuple, pair,
              and vector.
- A sed-like delimiter syntax is proposed to allow for unambiguous
              formatting of pair and tuple elements.
- The delimiter syntax may be dropped for now in order to focus on
              fill and alignment.
- The delimiter syntax could still be added for a future
              standard.
 
- Zach mentioned that the Unicode Bidirectional Algorithm document
          defines a set of paired brackets that could potentially be used as
          matched delimiters.
- [ Editor's note: The Unicode Bidirectional Algorithm document is
          UAX #9.
          Paired brackets are defined via the UCD Bidi_Paired_Bracket
          and Bidi_Paired_Bracket_Type properties in
          BidiBrackets.txt.
          ]
- Zach provided a brief introduction to how the term "character" gets
          used. Within the C++ standard, "character" generally means an object
          of type char, a "code point" represents some part of what we
          notionally think of as a character, and an "extended grapheme cluster"
          (EGC) represents a "glyph" or what we visually perceive to be a
          character.
- Zach stated that we might be able to get away with specifying
          delimiters as "characters", but noted that such interfaces tend to
          become regarded as broken later.
- Victor stated that, if the goal is to add some support in C++23, then
          custom delimiters should be dropped for now given concerns like how
          use of a digit as a delimiter could lead to problems.
- Corentin agreed with Barry and Victor that custom delimiter support
          can be postponed in favor of a more comprehensive solution later.
- Charlie argued strongly in favor of use of code points as delimiters
          given the lack of experience using EGCs in C++20.
- Charlie noted that EGCs do not necessarily correspond to what you
          might navigate through in a word processor.
- Charlie added that combining code points can be combined with bracket
          characters.
- Charlie stated that most other languages just use code points for
          delimiters.
- PBrett expressed concern about the choice of delimiters leading to
          format strings that are indistinguishable from line noise.
- Barry noted that, without custom delimiters, the only newly required
          character is `:`.
- PBrett acknowledged, but noted that a sequence of such characters is
          needed to navigate range hierarchies.
- Barry agreed, but noted that subrange formatting wouldn't otherwise
          be possible.
- PBrett suggested that a required custom formatter may be an
          improvement.
- Barry asked for feedback on two questions.
        
          - Is everyone happy with use of `?` for the debug specifier?
- Is everyone happy with the described quoting and escaping
              mechanism for string and character data?
 
- Victor responded that `?` seems ok for the debug specifier.
- PBrett asked if there are other use cases for which `?` might be
          desirable.
- Tom noted that `?` is often used in conjunction with optional
          data.
- Tom asked why the proposed specifier is called the "debug"
          specifier.
- Barry responded that "debug" is consistent with Rust's description
          of its equivalent functionality.
- Barry noted that Python uses "repr" for its equivalent.
- Jens observed that std::quoted() already exists for use
          with iostreams.
- Barry replied that using it would require an additional specifier
          like `Q`.
- PBindels agreed that the "debug" name for the new specifier is
          confusing.
- PBrett noted that the "debug" name would not be reflected in
          written format strings.
- Charlie expressed a preference for "debug" over "repr" so that the
          latter can be preserved for compiler generated representations.
- Jens asked for a summary of the escaping proposal.
- Barry replied that the intent is to do what
          {fmt}
          does and deferred to Victor.
- Victor stated that the escaping done by {fmt} was recently described
          in an email to the SG16 mailing list.
- [ Editor's note: that email is archived at
          https://lists.isocpp.org/sg16/2021/12/2874.php.
          ]
- Victor noted that the paper should be updated to describe what {fmt}
          currently does.
- Jens mentioned that the email states that code points in the range
          0 through 0x100 are formatted as hex escape of the form
          \xhh.
- Victor clarified that this substitution only applies to non-printable
          characters.
- Jens asked what characters are considered non-printable.
- Victor replied that Unicode specifies a non-printable property and
          that Rust has a non-printable concept.
- [ Editor's note: Unicode does not specify a printable or
          non-printable property, but does specify many properties from which
          such properties could be derived. ]
- Tom stated that there appear to be two specification questions:
        
          - What characters in the code point range 0 through 0x100 are
              considered non-printable?
- How are non-printable characters escaped?
 
- Tom expressed a preference for use of UCN notation for
          non-printable characters.
- Corentin agreed; use hex escapes for invalid code units and UCN
          notation for characters.
- Corentin suggested it might make sense to use hex escapes for
          non-Unicode encodings.
- PBrett asked if it would be a problem to specify UCN notation now,
          but then switch to
          P2290
          delimited escape sequences later.
- Jens stated that depends on other factors.
- PBrett replied that it therefore seems quite important to make the
          right decision now.
- Corentin indicated that there is no need to tie the choice of output
          format to the delimited escape sequences specified in P2290.
- Corentin stated that P2290 will appear in the next EWG eletronic
          voting cycle.
- Victor expressed reluctance towards P2290 delimited escape sequences
          due to increased verbosity and inconsistency with Rust.
- Victor added that use of brace delimiters with \x is
          unusual.
- PBrett encouraged use of delimited escape sequences for readability
          benefits.
- Jens asked if it is intended that copy/paste work to produce a string
          literal that matches the formatted output.
- Barry stated that would be a worthwhile goal.
- Jens noted that it is therefore necessary to avoid potential munging
          with \x; this might require spliced strings.
- Tom noted that such munging is a concern for human consumption as
          well.
- [ Editor's note: With regard to munging, consider
          \xdeface. Is that a single hex escape, a \xde escape
          followed by face, or something in between? ]
- Jens agreed, but noted that a human might expect that only hex escapes
          with two digits will be produced.
- Jens asserted that the ability to re-parse strongly suggests use of
          delimited escapes.
- Jens pondered whether the escape mechanism might require an EBCDIC
          based implementation to transcode to Unicode in order to produce a
          UCN.
- Jens stated that care is needed that deference to the Unicode DB for
          a non-printable property not result in a large dependency on the
          Unicode UCD.
- Jens suggested an implementation should be permitted to escape all
          non-ASCII characters.
- PBrett suggested that escape sequences could be limited to control
          characters.
- Corentin reported experience with implementing an
          isprintable() function and noted that it does not require a
          large table.
- Tom suggested that round tripping of an escaped string output should
          be possible with use of the std::scan() function proposed in
          P1729.
- Victor posted a link to an is_printable() implementation used
          in {fmt} and noted the small size of the tables used.
        
      
- Victor noted that limiting hex escapes to two digits avoids round trip
          concerns without requiring extra delimiters.
- PBrett requested that the next revision of the paper include
          discussion of these concerns.
- Corentin asked if the escape mechanism should be exposed as an
          independent facility.
- Barry suggested that independent facility could just be
          std::format().
- PBrett observed that a standalone facility could be added later.
- PBrett asked if SG16 should review an updated revision of this paper
          again.
- Corentin replied affirmatively.
- Jens agreed and noted a need to understand the escape mechanism.
- Jens stated that the paper should also address non-Unicode
          platforms.
- Corentin noted that, for wchar_t, a hex escape with only two
          digits is insufficient.
- Tom noted that two digits is insufficient for char when
          CHAR_BIT is greater than 8.
- Mark observed that the escape facility would be useful for dealing
          with file names.
- Victor agreed.
- Poll 0: We recommend using universal character name escape
          sequences rather than numerical escape sequences for the debug
          representation of all non-printable characters.
        
          - Attendance: 12
- 
            
          
- Consensus in favor
 
- Poll 1: We recommend using brace-delimited numerical escape
          sequences as described in P2290 "Delimited Escape Sequences" for
          'debug' formatting of invalid codeunits
          (including lone surrogates).
        
          - Attendance: 12
- 
            
          
- Consensus in favor
- A: Delimited hex escape sequences do not exist in C++ yet and
              are not used elsewhere; but since they will only appear in cases
              of invalid code units, not SA.
 
- Poll 2: We recommend using brace-delimited universal character
          name escape sequences as described in P2290
          "Delimited Escape Sequences" for 'debug' formatting of strings.
        
          - Attendance: 12
- 
            
          
- Consensus in favor
 
 
- LWG3639: Handling of fill character width is underspecified in std::format
    
      - Tom provided an introduction.
- Victor stated that the proposed resolution is somewhat novel and
          doesn't match what has been implemented in {fmt}.
- Victor noted the absence of a known use case.
- Victor added that there is no good solution for when alignment is not
          possible.
- Victor noted that option 3 allows changing behavior later.
- Victor recommended proceeding with option 3; if the estimated width is
          not 1 then an exception may be thrown or some other UB may occur.
- Tom asked what current implementations do.
- Victor responded that {fmt} assumes an estimated width of 1.
- PBrett argued against option 3 and provided U+3000 {IDEOGRAPHIC SPACE}
          as an example of a useful fill character with width other than 1.
- PBrett suggested that an exception could be thrown if alignment
          requests cannot be met.
- Zach recommended requiring an estimated width of 1 such that
          violations are diagnosed as ill-formed at compile-time and result in
          UB at run-time.
- Zach expressed a desire to avoid paying the cost of checking the
          estimated width when it will virtually never matter.
- Corentin expressed appreciation for PBrett's use case.
- Corentin stated that the estimated width approach is known not to
          produce perfect results in general and that he is therefore not very
          concerned with how this issue is resolved.
- Hubert expressed support for PBrett's use case.
- Hubert noted the current absence of a wording mechanism to determine
          the number of fill characters to insert.
- Corentin suggested we get implementation experience before proceeding
          and emphasized that option 3 provides time to do so with the goal of
          doing better in a future standard.
- PBindels agreed with restriction to an estimated width of 1 now, but
          with violations resulting in UB so that behavior can be changed
          later.
- Victor agreed that PBrett's use case is interesting, but asserted that
          we should not hand wave a solution for it; we should properly explore
          support for it.
 
- Tom stated that the next SG16 telecon will be held on 2021-12-15 and will
      likely revisit LWG3639.
- Tom requested "+1" responses to
      Corentin's post
      to the SG16 mailing list with updates to his
      P1854 and
      P2361
      papers by anyone that feels these papers are ready to poll forwarding to
      EWG.
- [ Editor's note: such "+1" responses were provided in response to a
      new post.
      ]
December 15th, 2021
Draft agenda:
Attendees:
  - Barry Revzin
- Charlie Barto
- Corentin Jabot
- JeanHeyd Meneide
- Jens Maurer
- Peter Brett
- Steve Downey
- Tim Song
- Tom Honermann
- Zach Laine
Meeting summary:
  - P2361R4: Unevaluated strings
    
      - PBrett explained that SG16 had previously reviewed this paper and
          that all prior feedback has been addressed.
- PBrett thanked Corentin for quickly updating the paper in response
          to the prior review and for soliciting new feedback on the mailing
          list.
- PBrett asked if there were any new comments.
- Tom requested that a table be added to the prose section that
          summarizes the intended changes; though the effects can be determined
          from the wording, the impact is subtle with regard to things like
          where raw string literals are now allowed or disallowed.
- Corentin agreed to do so.
- Jens expressed a belief that there are no changes with regard to where
          raw string literals are and are not allowed.
- Corentin agreed and noted that there were such changes in a previous
          revision, but that those changes have been removed.
- Poll 0: Forward P2361R4 "Unevaluated strings" to EWG with a
          recommended ship vehicle of C++23.
        
          - Attendance: 9
- 
            
          
- Consensus (though with a smaller quorum than is usual due to
              abstention from late arrivals).
 
 
- P1854R2: Conversion to literal encoding should not lead to loss of meaning
    
      - Corentin summarized recent changes to improve the motivation and
          wording and to correct typos.
- Corentin recalled that this paper was discussed in Belfast and in a
          recent telecon, but that the paper has not been polled since
          Belfast.
- [ Editor's note: Two polls were taken in Belfast as documented
          in the
          minutes for the discussion of P1885
          The first was a poll to confirm the direction of the paper and the
          second was to make it dependent on
          P1885 (Naming Text Encodings to Demystify Them).
          Both polls had consensus.  P1885 was recently approved via electronic
          polling by LEWG and is expected to be voted on during the next WG21
          plenary. ]
- Corentin explained that the paper proposes two changes:
        
          - Making non-encodable character literals ill-formed.
- Adding restrictions to the characters that may syntactically
              appear in multicharacter literals.
 
- Charlie asked if the proposal will break currently used methods to
          probe the literal encoding during constant evaluation.
- PBrett replied that we now have a facility that avoids the need for
          such probing.
- Charlie acknowledged the new facility and that its existence does
          reduce concerns, but that he still wanted to be sure about what the
          expectation is.
- Corentin confirmed that such code may be broken and stated that this
          concern was discussed in Belfast and was the motivation for blocking
          this paper on adoption of P1885.
- [ Editor's note: Whether such code is broken in practice will
          depend on what implementors choose to do. The changes require a
          diagnostic to be produced, but implementors are free to implement
          that as a warning in which case compilation failure would only occur
          if warnings are elevated to errors. ]
- Tom noted that P1885 recently passed LEWG electronic polling.
- Corentin asked if the macros added to recent Microsoft Visual C++
          releases to reflect the literal encoding are defined regardless of
          which /std options are passed.
- Charlie confirmed that they are.
- [ Editor's note: As of Microsoft Visual C++ version 19.30, the
          _MSVC_EXECUTION_CHARACTER_SET macro is predefined to
          indicate the code page being used for the literal encoding.
          ]
- Corentin noted that character probing mechanisms are not
          particularly reliable.
- PBrett stated that only one implementation is expected to have to
          change behavior if this proposal is adopted and noted that the
          implementor in question is aware of the proposal and has so far not
          objected to the proposed change.
- PBrett reported that prior wording feedback has been addressed.
- Jens read the following proposed addition to [lex.ccon].
        
          - "If a multicharacter literal contains a basic-c-char
              representing a codepoint that is not encodable as a single code
              unit in the ordinary literal encoding, the program is
              ill-formed"
 
- Jens noted that the difference between basic-c-char and
          c-char is that the former excludes escape sequences and
          asked if the prohibition against escape sequences was intended to
          apply to universal-character-names (UCNs) as well.
- Corentin replied that the design is intended only to apply to
          visually ambiguous scenarios and that use of a UCN does not create
          visual ambiguity.
- Jens noted that a UCN is not an escape sequence and that the paper
          prose discusses escape sequences, but not UCNs.
- Corentin replied that he will update the prose to make it explicit
          that UCNs are not prohibited.
- Jens pondered whether the previously read wording should state
          "UCS scalar value" in place of "codepoint".
- Corentin replied that the distinction is not relevant after
          translation phase 1.
- Jens opined that neither is actually needed and suggested rephrasing
          as, "... contains a basic-c-char that is not encodable as a
          single code unit ...".
- Corentin agreed to make a change.
- Tom pondered whether the parts of the note removed from [lex.ccon]
          that continue to be applicable to multicharacter literals should be
          preserved.
- PBrett pointed out that the note is non-normative and that the
          relevant parts of it, that multicharacter literals have an
          implementation-defined value, are normatively specified
          elsewhere.
- Poll 1: Modify P1854R2 "Conversion to literal encoding should not
          lead to loss of meaning" to address wording feedback and forward the
          paper as revised to EWG with a recommended ship vehicle of C++23.
        
          - Attendance: 10
- 
            
          
- Strong consensus in favor.
 
 
- D2286R4: Formatting Ranges
    
      - [ Editor's note: D2286R4 was the active paper under discussion at
          the telecon.  The agenda and links used here reference P2286R4 since
          the links to the draft paper were ephemeral.  The published document
          may differ from the reviewed draft revision. ]
- Corentin reported that the LEWG chair is skeptical that there is
          sufficient time available for this proposal to be reviewed and adopted
          for C++23.
- Tom reported that both SG9 and SG16 have planned time for review and
          that, assuming that both SGs forward the paper, further scheduling
          will be up to the LEWG chair.
- PBrett reminded the group that SG16 had previously advocated for
          adding an explicitly deleted format specialization for
          std::filesystem::path to this paper and dropping the support
          proposed in
          P1636R2 (Formatters for library types)
          pending a future paper that addresses std::filesystem::path
          specifically.
- PBrett stated that he wasn't sure if a later revision of the latter
          paper actually dropped that support.
- [ Editor's note: SG16 reviewed P1636R2 during its
          2021-09-22 telecon;
          that revision remains the current revision.  The poll taken then is
          recorded in
          a comment in the related GitHub tracking issue.
          ]
- Barry introduced the changes made since the last revision.
        
          - Hex escapes are now only used for ill-formed code unit
              sequences.
- Hex escapes now use delimited escape sequence notation.
- UCNs are now used for non-printable characters.
 
- Jens asked if there is any further intention of reducing scope in
          order to maintain a target of C++23.
- Barry replied that the intended scope is what is presented in this
          revision and that there are no current plans to further reduce
          scope.
- PBrett asked if consideration was given towards dropping support for
          the debug format.
- Barry replied affirmatively.
- Jens stated that the escaping behavior needs to address the
          possibility of lone surrogates.
- Tom asked if the expectation is that lone surrogates would be encoded
          in UCN notation.
- Jens replied that UCN notation does not permit specifying surrogate
          code points.
- Jens noted that the escaping behavior is described in terms of code
          points and that this differs from how string literals are specified;
          the latter is described in terms of code unit sequences.
- Jens added that specifying escape behavior in terms of code points
          requires the ability to reconstruct code points from code unit
          sequences and noted that shift encodings may not have a clearly
          defined code point space.
- Tom replied that translation to a UCS scalar value would still be
          possible, but may face implementation challenges.
- Jens noted the dependency on Unicode properties and pondered how that
          applies to non-Unicode encodings.
- Jens stated that "an implementation-defined equivalent of Unicode
          properties" could impose a documentation burden.
- PBrett suggested that requirement could be met by documenting a
          methodology as opposed to an explicit table of equivalent Unicode
          properties for other character sets.
- Corentin wondered whether newline characters should always be
          escaped.
- Corentin noted that there are design questions regarding whether
          unassigned code points and private use area (PUA) characters should
          be escaped.
- Corentin suggested that PUA characters should probably be escaped but
          that it is less clear how unassigned code points should be
          handled.
- Corentin wondered what the performance cost would be for the
          requirement to check the Grapheme_Extend property for
          characters at the start of a string.
- Corentin suggested that it may be desirable to specify escape behavior
          in terms of conversion to Unicode to ensure consistent behavior across
          implementations.
- Tom asked how it was determined that the
          Z (Separator) and C (Other) values
          of the General_Category property suffice to define printable
          characters.
- Corentin replied that those properties exclude all control, separator,
          and unassigned characters.
- Corentin noted that there is a design decision to be made regarding
          which separators should be considered printable.
- Corentin added that there is a trade off between getting a "right"
          result and potentially requiring a possibly large table of character
          properties.
- Tom asked if the lookup for the Grapheme_Extend property is
          intended to identify combining characters for which a base character
          is not available to combine with.
- Corentin confirmed that is the intent.
- Charlie asserted a need for further elaboration of what is meant by
          "a code unit that is not a part of a valid code point".
- Zach asserted that PUA characters should not be escaped and that they
          should be usable in the same manner as any other printable
          character.
- Zach stated that Unicode specifies how sequences of invalid code units
          should be handled and that processing them should be left to QoI.
- [ Editor's note: See the "Constraints on Conversion Processes" and
          "U+FFFD Substitution of Maximal Subparts" sections of 3.9,
          "Unicode Encoding Forms", in
          chapter 3 of Unicode 14.0
          for Unicode recommendations regarding handling of ill-formed code unit
          sequences. ]
- Tom stated that his understanding is that the intent is to preserve
          the values of all bytes that contribute to an invalid code unit
          sequence.
- Charlie mentioned that the Unicode standard refers to the
          WhatWG encoding standard
          for handling of ill-formed code unit sequences.
- [ Editor's note: It does so in the
          "U+FFFD Substitution of Maximal Subparts" section mentioned in the
          previous note. ]
- Charlie noted a design question; how are invalid code unit sequences
          delimited?
- Charlie suggested that it might be ok to discontinue consuming text
          after an invalid code unit sequence.
- Charlie asserted a requirement for wording to prohibit considering
          code units following an invalid code unit sequence as themselves being
          part of the invalid code unit sequence if they could signify the start
          of a potentially valid code unit sequence.
- [ Editor's note: This is consistent with guidance in the
          "Constraints on Conversion Processes" section mentioned in a previous
          note. ]
- Corentin asserted that replacement characters are not particularly
          helpful when trying to diagnose unexpected output; the actual byte or
          code unit values are needed.
- Corentin stated that further discussion regarding handling of
          ill-formed code unit sequences is needed.
- PBrett indicated that consensus for how to handle invalid code unit
          sequences is not yet clear and that there exists a design question of
          whether to emit replacement characters or preserve code unit values
          via hex escapes.
- PBrett suggested it may be worth stating in
          SD-8
          that debug formatting is not stable.
- Corentin noted that, because Unicode character properties are not
          stable, that we can't commit to stability anyway.
- PBrett requested that Barry submit the draft revision as a P
          paper.
- Barry agreed to do so, but reported that he had already edited it in
          response to the discussion.
- Corentin asked if the group has concerns regarding handling of
          non-Unicode encodings.
- PBrett replied that he would like to see wording, but that we are
          short on time.
- Poll 2: Modify D2286R4 to address design feedback, and forward the
          published paper as revised to LEWG with a recommended ship vehicle of
          C++23.
        
          - Attendance: 10
- 
            
          
- Consensus.
- N: Lack of wording.
- SA: Lack of wording; concerned that there will be subtle issues
              that won't become apparent until wording is available.
 
 
- Tom announced that the next telecon will be held 2022-01-12 and that the
      agenda is expected to include review of an updated revision of
      P2286 (Formatting Ranges),
      review of an updated proposed resolution for
      LWG3639 (Handling of fill character width is underspecified in std::format)
      and
      LWG3576 (Clarifying fill character in std::format),
      and/or initial review of
      P2491R0 (Text encodings follow-up)
      and
      P2498R0 (Forward compatibility of text_encoding with additional encoding registries).