Add digit separator support for binary, octal, and hexadecimal numbers #3134

jjtolton · 2025-10-24T04:19:44Z

Summary

Implements digit separator support for binary (0b), octal (0o), and hexadecimal (0x) number literals, completing the digit separator feature that was previously only available for decimal numbers.

This addresses the backlog item mentioned in #3132 (comment)

Specification

Implements: https://www.complang.tuwien.ac.at/ulrich/iso-prolog/digit_separators

Changes

Lexer enhancements (src/parser/lexer.rs):
- Added skip_underscore_in_hexadecimal() helper function
- Added skip_underscore_in_octal() helper function
- Added skip_underscore_in_binary() helper function
- Updated hexadecimal_constant() to use digit separators
- Updated octal_constant() to use digit separators
- Updated binary_constant() to use digit separators

Supported Syntax

All numeric bases now support the following digit separator patterns:

Single underscore: 0xDE_AD, 0o7_6, 0b10_11
Multiple underscores: 0x1_2_3_4, 0o1_2_3, 0b1_0_1_0
Underscore with whitespace: 0xFF_ 00, 0o77_ 00, 0b1111_ 0000
Underscore with comments: 0xDE_ /* test */ AD, 0o1_ /* octal */ 23, 0b10_ /* binary */ 11

Tests

Comprehensive test coverage at three levels:

Rust unit tests (src/tests/parse_tokens.rs): 16 tests
- Test token-level parsing for all numeric bases
- Verify all separator variations (basic, multiple, whitespace, comments)
Prolog integration tests (src/tests/digit_separators.pl): 18 tests
- Test actual Prolog execution with digit separators
- Cover edge cases (large numbers, case insensitivity)
CLI tests (tests/scryer/cli/src_tests/digit_separators.toml): 1 config
- End-to-end command-line testing

All tests pass successfully.

Examples

?- X = 0xDE_AD, X =:= 57005.
   true.

?- X = 0b1111_0000, X =:= 240.
   true.

?- X = 0o77_00, X =:= 4032.
   true.

?- X = 0xFF_ /* comment */ 00, X =:= 65280.
   true.

Checklist

Implementation follows existing code patterns
Comprehensive test coverage (Rust unit, Prolog integration, CLI)
All tests pass
Based on current upstream/master
Follows ISO Prolog digit separator specification

Implements the digit separator specification from: https://www.complang.tuwien.ac.at/ulrich/iso-prolog/digit_separators Changes: - Add digit separator support (underscore with optional whitespace/comments) for hexadecimal (0x), octal (0o), and binary (0b) number literals - Decimal numbers already had digit separator support - Add helper functions: skip_underscore_in_hexadecimal/octal/binary - Update hexadecimal_constant, octal_constant, and binary_constant functions Tests: - Add 16 Rust unit tests in src/tests/parse_tokens.rs - Add 18 Prolog integration tests in src/tests/digit_separators.pl - Add CLI test configuration in tests/scryer/cli/src_tests/digit_separators.toml - All tests verify: basic underscores, multiple underscores, whitespace after underscores, and comments between digits Examples: 0xDE_AD, 0xFF_ 00, 0xDE_ /* comment */ AD 0o7_6, 0o77_ 00, 0o1_ /* octal */ 23 0b10_11, 0b1111_ 0000, 0b10_ /* binary */ 11 Reference: mthom#3132 (comment)

triska · 2025-10-24T05:08:05Z

Please compare this with #3125!

jjtolton · 2025-10-24T05:46:13Z

Please compare this with #3125!

oh thats embarrassing 😰

triska · 2025-10-24T06:08:44Z

Not at all, this may still be useful for comparisons. For instance, to count the number of strings with length N that form a valid integer, and compare the two different implementations.

UWN · 2025-10-24T06:44:19Z

Follows ISO Prolog digit separator specification

You mean WG17.

Did you update the decimal part as well? An inconsistency (in the specification) was only discovered this week. To be clear, it was there since the 2010-11-19 proposal of O'Keefe and nobody noted it in 15 years. As a consequence, also Scryer has the following inconsistency: It supports option 9 but neither 10 nor 11. And probably it is best to reject all three. The specifications so far supported 9 and 11, but not 10 which does not make much sense, since 10 is, if at all, the only one with many digits.

(I am not sure how to proceed, and which approach is preferable. What is indeed important is the willingness to actively maintain the improvements. Like in Trealla.)

(In Scryer, I did not test number_chars for a couple of years because partial strings at that time consumed global resources which made testing impractical. Only now with their heap representation can I continue testing)

UWN · 2025-10-24T16:39:05Z

?- number_chars(N,"1_0.0").
   N = 10.0, unexpected.

Did you update the decimal part as well?

Not yet.

UWN · 2025-10-24T19:11:26Z

g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -v
9a840154
g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -f
?- number_chars(N,"0xa").
   error(syntax_error(lexical_error),number_chars/2), unexpected.

The previous implementation broke number_chars/2 with hex/octal/binary notation because skip_underscore_* helper functions propagated EOF errors instead of treating EOF as a valid end-of-number signal. Changes: - Inline digit separator checking directly in parsing loops - Handle EOF gracefully with explicit error checking - Remove unused skip_underscore_in_* helper functions - Maintain support for underscores with optional whitespace/comments Fixes issue reported by UWN where number_chars(N, "0xa") failed with lexical_error instead of correctly parsing to 10. Tests: - number_chars(N, "0xa") now works correctly - number_chars(N, "0xDE_AD") parses with digit separators - All existing Prolog integration tests still pass - Direct literals with comments still work: 0xDE_ /* comment */ AD

Adds comprehensive tests for number_chars/2 with: - Hexadecimal: "0xa", "0xDE_AD" - Octal: "0o76", "0o7_6" - Binary: "0b1011", "0b10_11" - Decimal with separator: "1_000" These tests ensure that number_chars/2 correctly parses all numeric formats with and without digit separators, addressing the issue reported by UWN where number_chars(N, "0xa") was broken. Note: Float digit separators (option 9: "1_0.0") currently work but are not tested here as they are undecided in the specification and may need to be rejected in the future per UWN's feedback.

Per WG17 decision, options 9-11 (float digit separators) are rejected for consistency. This commit implements rejection of option 9 which allowed digit separators before the decimal point (e.g., "1_0.0"). Changes: - Modified skip_underscore_in_number() to return (char, bool) tuple tracking whether an underscore was encountered - Updated number_token() to track had_separator flag throughout parsing - Added check before decimal point: reject if separators were used - Error type: ParseBigInt (cannot parse as float with separators) Tests: - Added 3 tests verifying option 9 rejection via number_chars and atoms - All tests now pass with proper error handling - Integer separators (1_000) still work correctly - Floats without separators (10.0) still work correctly Behavior: - BEFORE: number_chars(N, "1_0.0") → N = 10.0 (unexpected) - AFTER: number_chars(N, "1_0.0") → error(syntax_error(...)) This addresses UWN's feedback that option 9 should be rejected for consistency since options 10 and 11 (separators after decimal point and in exponents) are also not supported.

jjtolton · 2025-10-24T23:42:35Z

g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -v
9a840154
g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -f
?- number_chars(N,"0xa").
   error(syntax_error(lexical_error),number_chars/2), unexpected.

cdd8f91 👍

UWN · 2025-10-25T06:26:48Z

?- number_chars(N,"0_").
   N = 0, unexpected.
   syntax_error(...).

UWN · 2025-10-25T06:28:34Z

(it would be much better, if there would be an issue-link.)

UWN · 2025-10-25T07:20:07Z

?- number_chars(N,"0_ ").
   N = 0, unexpected.
   syntax_error(...).

For two reasons: It is not a valid number and it is followed by a layout at the end.

UWN · 2025-10-26T09:01:59Z

@jjtolton, do you no longer continue this attempt? Note that for the moment there is no way to directly compare Scryer to the other implementations, that is Trealla and SWI.

Per UWN feedback on PR mthom#3134, `number_chars(N, "0_ ")` should throw a syntax_error because: 1. The underscore is not between digits (trailing position) 2. There is layout (whitespace) after the underscore with no following digit This fix ensures that after consuming an underscore and any following layout in decimal numbers, we explicitly check for EOF or other errors and return ParseBigInt error instead of propagating EOF errors. Added test case to verify the rejection behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Per UWN feedback on PR mthom#3134, `number_chars(N, "0_ ")` should throw a syntax_error because: 1. The underscore is not between digits (trailing position) 2. There is layout (whitespace) after the underscore with no following digit This fix ensures that after consuming an underscore and any following layout in decimal numbers, we explicitly check for EOF or other errors and return ParseBigInt error instead of propagating EOF errors. Added test case to verify the rejection behavior.

jjtolton · 2025-10-26T15:03:52Z

@UWN cdd8f91

jjtolton · 2025-10-26T15:04:54Z

I will check for some other cases that may not have been covered.

jjtolton · 2025-10-26T15:05:55Z

Ultimately we should probably go with @bakaq's approach since it is more compact and intentional, this approach is nearly entirely procedurally generated. But I am happy to provide it as a reference.

Extended the fix for trailing underscores to handle both cases: 1. Trailing underscore with no layout: `123_` (EOF immediately after underscore) 2. Trailing underscore with layout: `0_ ` (EOF after underscore + whitespace) The issue was that scan_for_layout() would propagate EOF errors, which try_nt! macro would interpret as a valid partial token. Now we explicitly catch EOF from scan_for_layout() and return ParseBigInt error. Added comprehensive test coverage: - Trailing underscores for decimal, hex, octal, and binary - Double underscores (option 2 - undecided in spec, rejected by implementation) All 34 tests passing.

Per feedback on commit cdd8f91: - Use specific error types instead of wildcards (e.g., cannot_parse_big_int instead of syntax_error(_)) - Fix test that was catching type_error instead of syntax_error by using character list directly instead of passing an atom - Each test now validates the exact error that should occur for that specific syntax issue This makes tests more robust and precisely documents expected behavior.

jjtolton · 2025-10-26T15:18:30Z

Additional test cases added: cdd8f91#r168798457

triska · 2025-10-27T17:53:06Z

@jjtolton, @bakaq: We unexpectedly went from a what in the administration is called a positive competence conflict (two parties independently seeing themselves as responsible) to a negative competence conflict (no party feels responsible)?

Now both #3125 and #3134 are closed even though they attempted to solve the same problem?

jjtolton · 2025-10-27T18:42:44Z

This was an unusual situation but for the sake of consistency I will reopen this PR because that is what @UWN ended up testing against. The tests for the combined branch are pretty good now and leave open the possibility for an improved
implementation when time allows.

Moved float rejection check after determining if '.' is part of a float. Now "1_0." correctly parses as integer 10 followed by end token.

jjtolton · 2025-11-20T22:28:17Z

Fixed #3159: Digit separators now parse correctly when followed by period. Commit: 36d989b —J.J.'s robot.

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

jjtolton closed this Oct 24, 2025

jjtolton added 3 commits October 24, 2025 18:52

bakaq mentioned this pull request Oct 25, 2025

Digit separators for octal, binary and hex #3125

Closed

jjtolton added 2 commits October 26, 2025 11:07

jjtolton reopened this Oct 27, 2025

Fix digit separators not parsing when followed by period

36d989b

Moved float rejection check after determining if '.' is part of a float. Now "1_0." correctly parses as integer 10 followed by end token.

Add missing stdout file for digit_separators CLI test

ff652ba

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add digit separator support for binary, octal, and hexadecimal numbers #3134

Are you sure you want to change the base?

Add digit separator support for binary, octal, and hexadecimal numbers #3134

Uh oh!

Conversation

jjtolton commented Oct 24, 2025

Summary

Specification

Changes

Supported Syntax

Tests

Examples

Checklist

Uh oh!

triska commented Oct 24, 2025

Uh oh!

jjtolton commented Oct 24, 2025

Uh oh!

triska commented Oct 24, 2025

Uh oh!

UWN commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

UWN commented Oct 24, 2025

Uh oh!

UWN commented Oct 24, 2025

Uh oh!

jjtolton commented Oct 24, 2025

Uh oh!

UWN commented Oct 25, 2025

Uh oh!

UWN commented Oct 25, 2025

Uh oh!

UWN commented Oct 25, 2025

Uh oh!

UWN commented Oct 26, 2025

Uh oh!

jjtolton commented Oct 26, 2025

Uh oh!

jjtolton commented Oct 26, 2025

Uh oh!

jjtolton commented Oct 26, 2025

Uh oh!

jjtolton commented Oct 26, 2025

Uh oh!

triska commented Oct 27, 2025

Uh oh!

jjtolton commented Oct 27, 2025

Uh oh!

jjtolton commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

UWN commented Oct 24, 2025 •

edited

Loading