-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
gh-139109: A new tracing JIT compiler frontend for CPython #140310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The results after trusting the interpreter with specialization is slower. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg This likely indicates bugs in the specializing interpreter. For example, deltablue is suffering from the CALL_LIST_APPEND bug. I say we just merge the "slower" version though. We should fix the bugs in the interpreter separately. |
|
I'm not worried about the deltablue slowdown. It subclasses a list for no reason, which I don't think we care about. |
|
Is the iOS failure anything to do with this PR or is it failing elsewhere? |
| GOTO_TIER_ONE(target); | ||
| } | ||
|
|
||
| tier2 op(_GUARD_IP__PUSH_FRAME, (ip/4 --)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the plan was to generate these exits, otherwise they get out of sync with the source instruction.
What is OFFSET_OF(x)? The offset of what?
If generating the whole uop is too cumbersome because of uop ids and such, then generate the body, leaving the declaration looking like this:
tier2 op(_GUARD_IP__PUSH_FRAME, (ip/4 --)) {
IP_GUARD(_PUSH_FRAME);
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is automatically generating it from the offset of the IP.
I've changed the name to IP_OFFSET_OF to make it clearer.
| DISPATCH(); | ||
| } | ||
|
|
||
| label(record_previous_inst) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: To support the switch/case interpreter this will need to be an instruction.
No need to do so in this PR.
It's been failing intermittently for a few weeks: #141358 (comment). |
markshannon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final set of requests.
You seem to be using code_curr_size in various places for checking things other than just the uop count, such as whether tracing is active.
Can you add comments and/or names for the constants for all comparisons involving code_curr_size?
| } | ||
| if (jump_target != current_jump_target || current_exit_op != exit_op) { | ||
| make_exit(&buffer[next_spare], exit_op, jump_target); | ||
| bool is_control_flow = (opcode == _GUARD_IS_FALSE_POP || opcode == _GUARD_IS_TRUE_POP || is_for_iter_test[opcode]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the only use of is_for_iter_test, in which case replace it with is_control_flow and simplify this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's not. We also use it to decide to offset the target to pointer after the END_FOR too.
Python/optimizer.c
Outdated
| /* Special case the first instruction, | ||
| * so that we can guarantee forward progress */ | ||
| if (progress_needed && _tstate->jit_tracer_state.prev_state.code_curr_size <= 3) { | ||
| if (progress_needed && _tstate->jit_tracer_state.prev_state.code_curr_size <= CODE_SIZE_NO_PROGRESS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different. You've changed the rhs from 3 to 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should be CODE_SIZE_NO_PROGRESS. I realised the problem is that it's < not <= though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big step forward for the JIT.
@Fidget-Spinner thanks again for doing this.
|
CI caught a bug in the dependency tracking. I fixed it in the latest commit. |
Congratulations, @Fidget-Spinner ! |
* main: (463 commits) pythongh-140601: Add ResourceWarning to iterparse when not closed (pythonGH-140603) pythongh-137969: Fix double evaluation of `ForwardRef`s which rely on globals (python#140974) pythongh-139109: A new tracing JIT compiler frontend for CPython (pythonGH-140310) pythongh-141004: Document `PyErr_RangedSyntaxLocationObject` (python#141521) pythongh-140873: Add support of non-descriptor callables in functools.singledispatchmethod() (pythonGH-140884) pythongh-139653: Add PyUnstable_ThreadState_SetStackProtection() (python#139668) pythongh-141004: Document `PyCode_Optimize` (pythonGH-141378) pythongh-141004: Document C APIs for dictionary keys, values, and items (pythonGH-141009) pythongh-137959: Fix `TIER1_TO_TIER2` macro name in JIT InternalDocs (pythonGH-141496) pythongh-139871: Add `bytearray.take_bytes([n])` to efficiently extract `bytes` (pythonGH-140128) pythongh-140601: Refactor ElementTree.iterparse() tests (pythonGH-141499) pythongh-135801: Add the module parameter to compile() etc (pythonGH-139652) pythongh-140260: fix data race in `_struct` module initialization with subinterpreters (python#140909) pythongh-137109: refactor warning about threads when forking (python#141438) pythongh-141004: Document `PyRun_InteractiveOneObject` (pythonGH-141405) pythongh-124111: Fix TCL 9 thread detection (pythonGH-128103) pythongh-141442: Add escaping to iOS testbed arguments (python#141443) pythongh-140936: Fix JIT assertion crash at finalization if some generator is alive (pythonGH-140969) Add details about JIT build infrastructure and updating dependencies to `Tools/jit` (python#141167) pythongh-141412: Use reliable target URL for urllib example (pythonGH-141428) ...
* 'main' of github.com:python/cpython: (464 commits) pythongh-140601: Add ResourceWarning to iterparse when not closed (pythonGH-140603) pythongh-137969: Fix double evaluation of `ForwardRef`s which rely on globals (python#140974) pythongh-139109: A new tracing JIT compiler frontend for CPython (pythonGH-140310) pythongh-141004: Document `PyErr_RangedSyntaxLocationObject` (python#141521) pythongh-140873: Add support of non-descriptor callables in functools.singledispatchmethod() (pythonGH-140884) pythongh-139653: Add PyUnstable_ThreadState_SetStackProtection() (python#139668) pythongh-141004: Document `PyCode_Optimize` (pythonGH-141378) pythongh-141004: Document C APIs for dictionary keys, values, and items (pythonGH-141009) pythongh-137959: Fix `TIER1_TO_TIER2` macro name in JIT InternalDocs (pythonGH-141496) pythongh-139871: Add `bytearray.take_bytes([n])` to efficiently extract `bytes` (pythonGH-140128) pythongh-140601: Refactor ElementTree.iterparse() tests (pythonGH-141499) pythongh-135801: Add the module parameter to compile() etc (pythonGH-139652) pythongh-140260: fix data race in `_struct` module initialization with subinterpreters (python#140909) pythongh-137109: refactor warning about threads when forking (python#141438) pythongh-141004: Document `PyRun_InteractiveOneObject` (pythonGH-141405) pythongh-124111: Fix TCL 9 thread detection (pythonGH-128103) pythongh-141442: Add escaping to iOS testbed arguments (python#141443) pythongh-140936: Fix JIT assertion crash at finalization if some generator is alive (pythonGH-140969) Add details about JIT build infrastructure and updating dependencies to `Tools/jit` (python#141167) pythongh-141412: Use reliable target URL for urllib example (pythonGH-141428) ...
* 'main' of github.com:python/cpython: (464 commits) pythongh-140601: Add ResourceWarning to iterparse when not closed (pythonGH-140603) pythongh-137969: Fix double evaluation of `ForwardRef`s which rely on globals (python#140974) pythongh-139109: A new tracing JIT compiler frontend for CPython (pythonGH-140310) pythongh-141004: Document `PyErr_RangedSyntaxLocationObject` (python#141521) pythongh-140873: Add support of non-descriptor callables in functools.singledispatchmethod() (pythonGH-140884) pythongh-139653: Add PyUnstable_ThreadState_SetStackProtection() (python#139668) pythongh-141004: Document `PyCode_Optimize` (pythonGH-141378) pythongh-141004: Document C APIs for dictionary keys, values, and items (pythonGH-141009) pythongh-137959: Fix `TIER1_TO_TIER2` macro name in JIT InternalDocs (pythonGH-141496) pythongh-139871: Add `bytearray.take_bytes([n])` to efficiently extract `bytes` (pythonGH-140128) pythongh-140601: Refactor ElementTree.iterparse() tests (pythonGH-141499) pythongh-135801: Add the module parameter to compile() etc (pythonGH-139652) pythongh-140260: fix data race in `_struct` module initialization with subinterpreters (python#140909) pythongh-137109: refactor warning about threads when forking (python#141438) pythongh-141004: Document `PyRun_InteractiveOneObject` (pythonGH-141405) pythongh-124111: Fix TCL 9 thread detection (pythonGH-128103) pythongh-141442: Add escaping to iOS testbed arguments (python#141443) pythongh-140936: Fix JIT assertion crash at finalization if some generator is alive (pythonGH-140969) Add details about JIT build infrastructure and updating dependencies to `Tools/jit` (python#141167) pythongh-141412: Use reliable target URL for urllib example (pythonGH-141428) ...
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg
Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .
This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277
Some tests for the optimizers are disabled for now because their invariants have changed. The main one is the stack space check, as it's no longer valid to deal with underflow.
The full test suite passes on my system.
Not close enough to a holiday, so no poem this time!
Pros:
optimizer.cis now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.Cons:
Design:
record_previous_instfunction/label is executed. This does as the name suggests._DYNAMIC_EXIT.Future ideas:
_DYNAMIC_EXITwill cause a re-trace when hot enough starting from the target instruction.