-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
gh-141594: A free-threaded JIT #141595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-141594: A free-threaded JIT #141595
Conversation
|
@sergey-miryanov in the future could you please bundle up your reviews and send them in one review instead of multiple reviews? |
Misc/NEWS.d/next/Core_and_Builtins/2025-11-15-16-30-46.gh-issue-141594.PSsC5J.rst
Outdated
Show resolved
Hide resolved
Python/optimizer.c
Outdated
| PyInterpreterState *interp = _PyInterpreterState_GET(); | ||
| if (!interp->jit) { | ||
| // gh-140936: It is possible that interp->jit will become false during | ||
| if (!FT_ATOMIC_LOAD_CHAR_RELAXED(_tstate->jit_executor_state.jit)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we need to use atomic operation in here? even for the per-thread state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we need this. Because we set jit to false in the add_threadstate and we do it for all threads.
|
This PR has some serious issues:
Executors are not "effectively immortal"; cold executors do get freed. There are some useful improvements here, such as locking when inserting/removing executors from the linked list, but those should be in separate small PRs. Finally, we really need to clean up after adding the tracing JIT frontend before making any intrusive changes like this. |
This isn't true? The GC is now responsible for collecting executors.
Let me rephrase, they are effectively GC-only, not refcounted. As they always form reference cycles. So there's no point refcounting them. Just leave it to the GC.
Yeap that I agree on. |
|
@markshannon I reduced the diff to the minimal changeset just to get everything working. This is just +50 lines of code roughly now! |
| { | ||
| PyInterpreterState *interp = PyInterpreterState_Get(); | ||
| _Py_Executors_InvalidateDependency(interp, obj, 1); | ||
| _Py_Executors_InvalidateDependency(_PyInterpreterState_GET(), obj, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be reverted too.
| } | ||
|
|
||
| PyFunctionObject * | ||
| _PyFunction_LookupByVersion(uint32_t version, PyObject **p_code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function really used somewhere?
Maybe we should remove it on main in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strange, you're right, it's not used.
It's useful for the optimizer though, so we might want to use it in the future.
markshannon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have time for a detailed review ATM, but I've a few comments
| // based on the version, so we do not need to stop the world to set it. | ||
| func->func_version = version; | ||
| #ifndef Py_GIL_DISABLED | ||
| #if _Py_TIER2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the tier 1 with-gil behavior.
I don't see why we would need the cache in tier 1, but can you double check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I double checked.
| uint32_t | ||
| _PyFunction_GetVersionForCurrentState(PyFunctionObject *func) | ||
| { | ||
| // This function does not need locking/atomics as it can only be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the function version in specialized instructions, so it is used in tier 1.
I'm surprised this isn't already synchronized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use in the specializer is protected because it's accessing a stack variable which holds a strong reference to it. I will clarify that the unsnchronized use is only in the optimizer and it's fine there.
| _PyCode_Clear_Executors(code); | ||
| } | ||
| _Py_Executors_InvalidateDependency(interp, code, 1); | ||
| _PyJit_Tracer_InvalidateDependency(PyThreadState_GET(), code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged it into _Py_Executors_InvalidateDependency because it's actually always called when we invalidate executor dependency. So logically, it should be part of it.
Python/optimizer_analysis.c
Outdated
| PyObject *lookup = _PyType_Lookup(type, name); | ||
| if (lookup) { | ||
| int opcode = _Py_IsImmortal(lookup) ? immortal : mortal; | ||
| int opcode = _Py_IsImmortal(lookup) || _PyObject_HasDeferredRefcount(lookup) ? immortal : mortal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens to be correct, because the uop we use for immortal objects tags the pointer.
But in theory, we could not tag the pointer and the code would be correct for immortal objects but not those with deferred refcounts.
However, it is always safe to use a tagged pointer for an immortal object.
So, can you rename immortal to deferred_count or similar.
This PR gives the JIT free-threading support. It is only on for single-threaded code, and turns off automatically on multi-threaded code. All JIT features are turned on, including the optimizer. All tests pass on my system, including TSAN as of 527aac1 usng the FT suppression file, except for the usual spurious race conditions already in existing CPython.
Benchmark results are good. Geomean of 2% faster on FT+JIT vs just FT. So the JIT provides a 2% geomean speedup on FT for pyperformance! Compare https://github.com/facebookexperimental/free-threading-benchmarking/tree/main/results/bm-20251116-3.15.0a1+-763ebea-JIT,NOGIL against https://github.com/facebookexperimental/free-threading-benchmarking/tree/main/results/bm-20251115-3.15.0a1+-ed73c90-NOGIL. For convenience, I downloaded the json files and pyperf compared them here https://gist.github.com/Fidget-Spinner/d79f2f050147fc355a300bf9d413a75e
Design:
Creation of >1 threads cause global invalidation of all executors and disables JIT.
In the future, we can do lock removal too. The performance figures from the lock removal are promising (no lock removal vs lock removal). All with FT+PGO+LTO=thin+TC+JIT +
pyperf system tune:This is on a platform where locking/atomics are somewhat slow (i7-12700h). I removed the lock removal code for this PR to reduce the diff.