CWG3094 Rework phases for string literal concatenation and token formation

eisenwave · tkoeppe · commit 94055b39a902 · 2025-12-13T17:13:26.000Z
Fixes NB US 6-020 (C++26 CD).
Fixes NB US 7-019 (C++26 CD).
diff --git a/source/lex.tex b/source/lex.tex
@@ -165,14 +165,14 @@
 as specified in \ref{lex.string}.
 Each such \grammarterm{string-literal} preprocessing token is then considered to have
 that common \grammarterm{encoding-prefix}.
-
-\item
 \indextext{concatenation!string}%
-Adjacent \grammarterm{string-literal} preprocessing tokens are concatenated\iref{lex.string}.
+Then, adjacent \grammarterm{string-literal} preprocessing tokens are concatenated\iref{lex.string}.
 
 \item
 Each preprocessing token is converted into a token\iref{lex.token}.
-The resulting tokens constitute a \defn{translation unit} and
+
+\item
+The tokens constitute a \defn{translation unit} and
 are syntactically and
 semantically analyzed as a \grammarterm{translation-unit}\iref{basic.link} and
 translated.
@@ -547,7 +547,7 @@
 
 \pnum
 A preprocessing token is the minimal lexical element of the language in translation
-phases 3 through 6.
+phases 3 through 5.
 In this document,
 glyphs are used to identify
 elements of the basic character set\iref{lex.charset}.
@@ -807,7 +807,7 @@
 \end{bnf}
 
 Each \grammarterm{operator-or-punctuator} is converted to a single token
-in translation phase 7\iref{lex.phases}.%
+in translation phase 6\iref{lex.phases}.%
 \indextext{punctuator|)}%
 \indextext{operator|)}
 
@@ -1971,7 +1971,7 @@
 \end{note}
 
 \pnum
-In translation phase 6\iref{lex.phases},
+In translation phase 5\iref{lex.phases},
 adjacent \grammarterm{string-literal}s are concatenated.
 The lexical structure and grouping of
 the contents of the individual \grammarterm{string-literal}s is retained.
@@ -2320,11 +2320,11 @@
 \end{example}
 
 \pnum
-In translation phase 6\iref{lex.phases}, adjacent \grammarterm{string-literal}s are concatenated and
+In translation phase 5\iref{lex.phases}, adjacent \grammarterm{string-literal}s are concatenated and
 \grammarterm{user-defined-string-literal}{s} are considered \grammarterm{string-literal}s for that
 purpose. During concatenation, \grammarterm{ud-suffix}{es} are removed and ignored and
 the concatenation process occurs as described in~\ref{lex.string}. At the end of phase
-6, if a \grammarterm{string-literal} is the result of a concatenation involving at least one
+5, if a \grammarterm{string-literal} is the result of a concatenation involving at least one
 \grammarterm{user-defined-string-literal}, all the participating
 \grammarterm{user-defined-string-literal}{s} shall have the same \grammarterm{ud-suffix}
 and that suffix is applied to the result of the concatenation.
diff --git a/source/meta.tex b/source/meta.tex
@@ -6552,22 +6552,19 @@
   \item
     \tcode{holds_alternative<u8string>(options.name->\exposid{contents})} is \tcode{true}
     and \tcode{get<u8string>(\brk{}options.name->\exposid{contents})}
-    contains a valid identifier\iref{lex.name}
-    that is not a keyword\iref{lex.key}
+    contains the spelling of a valid \grammarterm{token}
+    that is an \grammarterm{identifier}\iref{lex.name}
     when interpreted with UTF-8, or
   \item
     \tcode{holds_alternative<string>(options.name->\exposid{contents})} is \tcode{true}
     and \tcode{get<string>(opt\-ions.name->\exposid{contents})}
-    contains a valid identifier\iref{lex.name}
-    that is not a keyword\iref{lex.key}
+    contains the spelling of a valid \grammarterm{token}
+    that is an \grammarterm{identifier}\iref{lex.name}
     when interpreted with the ordinary literal encoding;
   \end{itemize}
   \begin{note}
-  The name corresponds to the spelling of an identifier~token
-  after phase~6 of translation\iref{lex.phases}.
   Lexical constructs like
-  \grammarterm{universal-character-name}s\iref{lex.universal.char} are not processed
-  and will cause evaluation to fail.
+  \grammarterm{universal-character-name}s\iref{lex.universal.char} are not processed.
   For example, \tcode{R"(\textbackslash u03B1)"} is an invalid identifier
   and is not interpreted as \tcode{"$\alpha$"}.
   \end{note}