gh-135676: Simplify docs on lexing names #140464

encukou · 2025-10-22T15:53:31Z

This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section.

It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but:

parses any non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators
normalizes the name
validates the name, using the id_start/id_continue sets (referred to in previous sections as “letter-like” and “number-like” characters, with a link to the details)

This also means we don't need xid_start/xid_continue to define the behaviour :)

Issue: Reword the Lexical Analysis chapter of the docs #135676

📚 Documentation preview 📚: https://cpython-previews--140464.org.readthedocs.build/

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

willingc

Outstanding document @encukou. I had one small suggestion to be a bit more explicit on the normalization example with number.

willingc · 2025-10-22T18:24:46Z

Doc/reference/lexical_analysis.rst

+This means that, for example, some typographic variants of characters are
+converted to their "basic" form, for example::
+
+   >>> nᵘₘᵇₑʳ = 3


It would be helpful to add an explicit comment that the normalized form of nᵘₘᵇₑʳis number.

Does this look good?

encukou · 2025-11-05T10:47:49Z

There was an insightful conversation in #140269. I'll update this PR to make things even clearer.

willingc

Thanks @encukou

willingc

Nice work @encukou!

encukou · 2025-11-20T15:22:42Z

Thank you for the review!

@malemburg, do you also want to take a look?

miss-islington-app · 2025-11-26T17:19:30Z

Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington-app · 2025-11-26T17:19:32Z

Sorry, @encukou, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 2ff8608b4da33f667960e5099a1a442197acaea4 3.14

bedevere-app · 2025-11-27T12:31:04Z

GH-142015 is a backport of this pull request to the 3.14 branch.

This simplifies the Lexical Analysis section on Names (but keeps it technically correct) by putting all the info about non-ASCII characters in a separate (and very technical) section. It uses a mental model where the parser doesn't handle Unicode complexity “immediately”, but: - parses any non-ASCII character (outside strings/comments) as part of a name, since these can't (yet) be e.g. operators - normalizes the name - validates the name, using the xid_start/xid_continue sets (cherry picked from commit 2ff8608) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

encukou and others added 4 commits October 8, 2025 17:58

Simplify Names section

4606120

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

Casing; 3 dots for character ranges

6163c24

Clean-ups

de6d1af

Mention Unicode's *ID_Start* and *ID_Continue*

152e7aa

encukou requested review from AA-Turner and willingc as code owners October 22, 2025 15:53

bedevere-app bot added docs Documentation in the Doc dir skip news labels Oct 22, 2025

github-project-automation bot added this to Docs PRs Oct 22, 2025

github-project-automation bot moved this to Todo in Docs PRs Oct 22, 2025

bedevere-app bot mentioned this pull request Oct 22, 2025

Reword the Lexical Analysis chapter of the docs #135676

Open

bedevere-app bot added the awaiting core review label Oct 22, 2025

StanFromIreland linked an issue Oct 22, 2025 that may be closed by this pull request

Docs: note requirement to normalise unicode identifiers passed to globals() and locals() #86846

Closed

willingc approved these changes Oct 22, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Oct 22, 2025

Make it clear that nᵘₘᵇₑʳ normalizes to number

fce5e98

encukou mentioned this pull request Nov 4, 2025

gh-129117: Expose _PyUnicode_IsXidContinue/Start in unicodedata #140269

Merged

encukou marked this pull request as draft November 5, 2025 10:45

bedevere-app bot removed the awaiting merge label Nov 5, 2025

willingc approved these changes Nov 10, 2025

View reviewed changes

bedevere-app bot added the awaiting merge label Nov 10, 2025

encukou added 3 commits November 12, 2025 18:06

WIP

b9fdcf0

Merge in the main branch

2e7f7c0

Reword to use XID_Start and XID_Continue

43f6091

encukou marked this pull request as ready for review November 19, 2025 16:08

bedevere-app bot added awaiting core review and removed awaiting merge labels Nov 19, 2025

willingc approved these changes Nov 19, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Nov 19, 2025

encukou merged commit 2ff8608 into python:main Nov 26, 2025
36 checks passed

encukou deleted the lex-analysis-names-simpler branch November 26, 2025 15:10

bedevere-app bot removed the awaiting merge label Nov 26, 2025

github-project-automation bot moved this from Todo to Done in Docs PRs Nov 26, 2025

encukou added the needs backport to 3.14 bugs and security fixes label Nov 26, 2025

miss-islington-app bot assigned encukou Nov 26, 2025

bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-135676: Simplify docs on lexing names #140464

gh-135676: Simplify docs on lexing names #140464

Uh oh!

encukou commented Oct 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

willingc left a comment

Uh oh!

willingc Oct 22, 2025

Uh oh!

encukou Oct 29, 2025

Uh oh!

encukou commented Nov 5, 2025

Uh oh!

willingc left a comment

Uh oh!

willingc left a comment

Uh oh!

encukou commented Nov 20, 2025

Uh oh!

Uh oh!

miss-islington-app bot commented Nov 26, 2025

Uh oh!

miss-islington-app bot commented Nov 26, 2025

Uh oh!

bedevere-app bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

gh-135676: Simplify docs on lexing names #140464

gh-135676: Simplify docs on lexing names #140464

Uh oh!

Conversation

encukou commented Oct 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

willingc Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

encukou Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

encukou commented Nov 5, 2025

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

encukou commented Nov 20, 2025

Uh oh!

Uh oh!

miss-islington-app bot commented Nov 26, 2025

Uh oh!

miss-islington-app bot commented Nov 26, 2025

Uh oh!

bedevere-app bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

encukou commented Oct 22, 2025 •

edited by github-actions bot

Loading