-
Notifications
You must be signed in to change notification settings - Fork 157
Add digit separator support for binary, octal, and hexadecimal numbers #3134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Implements the digit separator specification from: https://www.complang.tuwien.ac.at/ulrich/iso-prolog/digit_separators Changes: - Add digit separator support (underscore with optional whitespace/comments) for hexadecimal (0x), octal (0o), and binary (0b) number literals - Decimal numbers already had digit separator support - Add helper functions: skip_underscore_in_hexadecimal/octal/binary - Update hexadecimal_constant, octal_constant, and binary_constant functions Tests: - Add 16 Rust unit tests in src/tests/parse_tokens.rs - Add 18 Prolog integration tests in src/tests/digit_separators.pl - Add CLI test configuration in tests/scryer/cli/src_tests/digit_separators.toml - All tests verify: basic underscores, multiple underscores, whitespace after underscores, and comments between digits Examples: 0xDE_AD, 0xFF_ 00, 0xDE_ /* comment */ AD 0o7_6, 0o77_ 00, 0o1_ /* octal */ 23 0b10_11, 0b1111_ 0000, 0b10_ /* binary */ 11 Reference: mthom#3132 (comment)
|
Please compare this with #3125! |
oh thats embarrassing 😰 |
|
Not at all, this may still be useful for comparisons. For instance, to count the number of strings with length |
You mean WG17. Did you update the decimal part as well? An inconsistency (in the specification) was only discovered this week. To be clear, it was there since the 2010-11-19 proposal of O'Keefe and nobody noted it in 15 years. As a consequence, also Scryer has the following inconsistency: It supports option 9 but neither 10 nor 11. And probably it is best to reject all three. The specifications so far supported 9 and 11, but not 10 which does not make much sense, since 10 is, if at all, the only one with many digits. (I am not sure how to proceed, and which approach is preferable. What is indeed important is the willingness to actively maintain the improvements. Like in Trealla.) (In Scryer, I did not test number_chars for a couple of years because partial strings at that time consumed global resources which made testing impractical. Only now with their heap representation can I continue testing) |
Not yet. |
|
The previous implementation broke number_chars/2 with hex/octal/binary notation because skip_underscore_* helper functions propagated EOF errors instead of treating EOF as a valid end-of-number signal. Changes: - Inline digit separator checking directly in parsing loops - Handle EOF gracefully with explicit error checking - Remove unused skip_underscore_in_* helper functions - Maintain support for underscores with optional whitespace/comments Fixes issue reported by UWN where number_chars(N, "0xa") failed with lexical_error instead of correctly parsing to 10. Tests: - number_chars(N, "0xa") now works correctly - number_chars(N, "0xDE_AD") parses with digit separators - All existing Prolog integration tests still pass - Direct literals with comments still work: 0xDE_ /* comment */ AD
Adds comprehensive tests for number_chars/2 with: - Hexadecimal: "0xa", "0xDE_AD" - Octal: "0o76", "0o7_6" - Binary: "0b1011", "0b10_11" - Decimal with separator: "1_000" These tests ensure that number_chars/2 correctly parses all numeric formats with and without digit separators, addressing the issue reported by UWN where number_chars(N, "0xa") was broken. Note: Float digit separators (option 9: "1_0.0") currently work but are not tested here as they are undecided in the specification and may need to be rejected in the future per UWN's feedback.
Per WG17 decision, options 9-11 (float digit separators) are rejected for consistency. This commit implements rejection of option 9 which allowed digit separators before the decimal point (e.g., "1_0.0"). Changes: - Modified skip_underscore_in_number() to return (char, bool) tuple tracking whether an underscore was encountered - Updated number_token() to track had_separator flag throughout parsing - Added check before decimal point: reject if separators were used - Error type: ParseBigInt (cannot parse as float with separators) Tests: - Added 3 tests verifying option 9 rejection via number_chars and atoms - All tests now pass with proper error handling - Integer separators (1_000) still work correctly - Floats without separators (10.0) still work correctly Behavior: - BEFORE: number_chars(N, "1_0.0") → N = 10.0 (unexpected) - AFTER: number_chars(N, "1_0.0") → error(syntax_error(...)) This addresses UWN's feedback that option 9 should be rejected for consistency since options 10 and 11 (separators after decimal point and in exponents) are also not supported.
cdd8f91 👍 |
|
|
(it would be much better, if there would be an issue-link.) |
For two reasons: It is not a valid number and it is followed by a layout at the end. |
|
@jjtolton, do you no longer continue this attempt? Note that for the moment there is no way to directly compare Scryer to the other implementations, that is Trealla and SWI. |
Per UWN feedback on PR mthom#3134, `number_chars(N, "0_ ")` should throw a syntax_error because: 1. The underscore is not between digits (trailing position) 2. There is layout (whitespace) after the underscore with no following digit This fix ensures that after consuming an underscore and any following layout in decimal numbers, we explicitly check for EOF or other errors and return ParseBigInt error instead of propagating EOF errors. Added test case to verify the rejection behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Per UWN feedback on PR mthom#3134, `number_chars(N, "0_ ")` should throw a syntax_error because: 1. The underscore is not between digits (trailing position) 2. There is layout (whitespace) after the underscore with no following digit This fix ensures that after consuming an underscore and any following layout in decimal numbers, we explicitly check for EOF or other errors and return ParseBigInt error instead of propagating EOF errors. Added test case to verify the rejection behavior.
|
I will check for some other cases that may not have been covered. |
|
Ultimately we should probably go with @bakaq's approach since it is more compact and intentional, this approach is nearly entirely procedurally generated. But I am happy to provide it as a reference. |
Extended the fix for trailing underscores to handle both cases: 1. Trailing underscore with no layout: `123_` (EOF immediately after underscore) 2. Trailing underscore with layout: `0_ ` (EOF after underscore + whitespace) The issue was that scan_for_layout() would propagate EOF errors, which try_nt! macro would interpret as a valid partial token. Now we explicitly catch EOF from scan_for_layout() and return ParseBigInt error. Added comprehensive test coverage: - Trailing underscores for decimal, hex, octal, and binary - Double underscores (option 2 - undecided in spec, rejected by implementation) All 34 tests passing.
Per feedback on commit cdd8f91: - Use specific error types instead of wildcards (e.g., cannot_parse_big_int instead of syntax_error(_)) - Fix test that was catching type_error instead of syntax_error by using character list directly instead of passing an atom - Each test now validates the exact error that should occur for that specific syntax issue This makes tests more robust and precisely documents expected behavior.
|
Additional test cases added: cdd8f91#r168798457 |
|
@jjtolton, @bakaq: We unexpectedly went from a what in the administration is called a positive competence conflict (two parties independently seeing themselves as responsible) to a negative competence conflict (no party feels responsible)? Now both #3125 and #3134 are closed even though they attempted to solve the same problem? |
|
This was an unusual situation but for the sake of consistency I will reopen this PR because that is what @UWN ended up testing against. The tests for the combined branch are pretty good now and leave open the possibility for an improved |
Moved float rejection check after determining if '.' is part of a float. Now "1_0." correctly parses as integer 10 followed by end token.
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
Implements digit separator support for binary (
0b), octal (0o), and hexadecimal (0x) number literals, completing the digit separator feature that was previously only available for decimal numbers.This addresses the backlog item mentioned in #3132 (comment)
Specification
Implements: https://www.complang.tuwien.ac.at/ulrich/iso-prolog/digit_separators
Changes
src/parser/lexer.rs):skip_underscore_in_hexadecimal()helper functionskip_underscore_in_octal()helper functionskip_underscore_in_binary()helper functionhexadecimal_constant()to use digit separatorsoctal_constant()to use digit separatorsbinary_constant()to use digit separatorsSupported Syntax
All numeric bases now support the following digit separator patterns:
0xDE_AD,0o7_6,0b10_110x1_2_3_4,0o1_2_3,0b1_0_1_00xFF_ 00,0o77_ 00,0b1111_ 00000xDE_ /* test */ AD,0o1_ /* octal */ 23,0b10_ /* binary */ 11Tests
Comprehensive test coverage at three levels:
Rust unit tests (
src/tests/parse_tokens.rs): 16 testsProlog integration tests (
src/tests/digit_separators.pl): 18 testsCLI tests (
tests/scryer/cli/src_tests/digit_separators.toml): 1 configAll tests pass successfully.
Examples
Checklist