-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
gh-131253: free-threaded build support for pystats #137189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Allow the --enable-pystats build option to be used with free-threading. For the free-threaded builds, the stats structure is allocated per-thread and then periodically merged into a global stats structure (on thread exit or when the reporting function is called). Summary of changes: * introduce _Py_tss_stats thread-local variable. This is set when stats are on, replacing the _Py_stats global that's used in the non-free-threaded build. * replace _Py_stats references with _PyStats_GET() * move pystats logic from Python/specialize.c into Python/pystats.c * add some free-threaded specific stat counters
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The management of the thread local variable seems complex. Can we keep the thread-local stats on PyThreadState and have _PyStats_GET() return something like _PyThreadState_GET()->stats? How much would the extra indirection slow down the stats build?
Yeah, it's maybe overly complex. I don't think there would be much performance difference since |
|
If you are considered about exposing it publicly, you can add to it |
Need to do a merge before reporting (I lost that bit of code on a re-factor). Fix various issues with data races. When merging from all threads, we need to stop-the-world to avoid races. When toggling on or off, also need to stop-the-world. Remove the need for locking for _PyStats_Attach().
After some profiling, it seems that having the |
Add pystats pointer to PyThreadState and use from there instead. This is slightly slower but shouldn't matter in practice. This simplifies the attach/detach logic as well.
3f58012 to
488f0be
Compare
|
Looks good in general. Moving the stats to its own file makes a lot of sense. How have you tested this? We used to be able to use the Microsoft test runner to show the diff in stats, but that's not available any more. |
I ran various scripts with pystats enabled, then compared the output (with and without this PR), ignoring differences in the counts. |
Add the _PyStats_InterpInit() function which will set pystats_enabled flag and allocate pystats_struct as required. We shouldn't be putting logic in _PyObject_InitState(). This also fixes a bug where interp->pystats_struct was allocated repeated rather than once per interpreter.
This is a bit more efficient, with the downside of a bit extra code. For _PyStats_Clear(), we don't need to do the merging operation. I think the code is a bit more clear this way.
| if (interp->config._pystats) { | ||
| // start with pystats enabled, can be disabled via sys._stats_off() | ||
| // this needs to be set before the first tstate is created | ||
| interp->pystats_enabled = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we use PyMem_RawCalloc to allocate new interp memory. Then the default value of interp->pystats_enabled is 0, so if interp->config._pystats is false, we don't need to initialize it.
sergey-miryanov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| static inline PyStats* | ||
| _PyThreadState_GetStatsFast(void) | ||
| { | ||
| if (_Py_tss_tstate == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should be able to use _PyThreadState_GetCurrent instead of accessing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want _PyThreadState_GetCurrent() because I want to inline the access to _Py_tss_tstate->pystats. The _PyStats_GET() function is used a lot from hot paths.
I could potentially use _PyThreadState_GET(). However, I don't see an easy way to make that work. The header that defines that inline function is not always included, e.g. for Programs/_freeze_module. Accessing _Py_tss_state directly is the simplest way I could find to do it.
Need to allocate in stats_toggle_on_off() if the thread never had stats enabled yet. For zeroing, handle case that struct is not allocated. Remove bogus ts->_status.active check.
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
| PyAPI_FUNC(void) PyThreadState_LeaveTracing(PyThreadState *tstate); | ||
|
|
||
| #ifdef Py_STATS | ||
| #if defined(HAVE_THREAD_LOCAL) && !defined(Py_BUILD_CORE_MODULE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #140690, HAVE_THREAD_LOCAL check is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still needed. In pycore_pystate.h we have:
#if defined(HAVE_THREAD_LOCAL) && !defined(Py_BUILD_CORE_MODULE)
extern _Py_thread_local PyThreadState *_Py_tss_tstate;
#endif
I could change the code to:
#if defined(Py_BUILD_CORE) && !defined(Py_BUILD_CORE_MODULE)
But I don't think that's better.
| // Export for most shared extensions | ||
| PyAPI_FUNC(PyStats *) _PyStats_GetLocal(void); | ||
|
|
||
| #if defined(HAVE_THREAD_LOCAL) && !defined(Py_BUILD_CORE_MODULE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Allow the
--enable-pystatsbuild option to be used with free-threading. For the free-threaded builds, the stats structure is allocated per-thread and then periodically merged into a per-interpter stats structure (on thread exit or when the reporting function is called).Summary of changes:
Add
pystatsto the PyThreadState structure. This is a pointer to the PyStats structure if recording of stats is enabled. This is used for both the free-threaded and default builds.For the free-threaded build, the _PyThreadStateImpl structure gains a pystats_struct member. This is allocated per-thread when recording of pystats is first enabled.
replace
_Py_statsreferences with_PyStats_GET()move pystats logic from Python/specialize.c into Python/pystats.c
move the pystats global state into the interpreter structure
add some free-threaded specific stat counters
Notes and potential issues:
The FTStats counts will need review. I wasn't sure about the naming or even how useful these might be. For
mutex_sleeps, we need to determine what is the most useful thing to measure. I increment that count whenever we have to "spin" on a mutex.The verbose code for
print_*andmerge_*is perhaps not ideal. I considered making this data driven instead. However, I think that actually would add complexity, not reduce it.I plan on adding two levels of pystats recording: default and extended. The "default" level would be enabled by default (or at least could be) and would only include stats that are relatively cheap to record. The "extended" level would be like what the current
--enable-pystatsdoes (counting things like INCREF/DECREF, which is expensive).