-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
Description
Feature or enhancement
The object_lookup_special microbenchmark in Tools/ftscalingbench/ftscalingbench.py currently doesn't scale well and is indicative of a broader FT performance issue that we should fix. The benchmark just calls round() from multiple threads concurrently:
cpython/Tools/ftscalingbench/ftscalingbench.py
Lines 62 to 66 in 56d0f9a
| def object_lookup_special(): | |
| # round() uses `_PyObject_LookupSpecial()` internally. | |
| N = 1000 * WORK_SCALE | |
| for i in range(N): | |
| round(i / N) |
The issue is that round() calls _PyObject_LookupSpecial(number, &_Py_ID(__round__)), which increments the reference count of the returned function (i.e., of float.round). The underlying function supports deferred reference counting, but _PyObject_LookupSpecial and _PyType_LookupRef do not take advantage of it.
For the FT build, we also need some extra support in order to safely use _PyStackRef in builtin_round_impl, because it's important that all _PyStackRefs are visible to the GC. To support this, we can add a singly linked list of active _PyStackRefs to _PyThreadStateImpl.
The struct _PyCStackRef implements this linked list pointer + a _PyStackRef. In the GIL-enabled build, there's no linked list and it's essentially the same as _PyStackRef.
// A stackref that can be stored in a regular C local variable and be visible
// to the GC in the free threading build.
// Used in combination with _PyThreadState_PushCStackRef().
typedef struct _PyCStackRef {
_PyStackRef ref;
#ifdef Py_GIL_DISABLED
struct _PyCStackRef *next;
#endif
} _PyCStackRef;
struct _PyThreadStateImpl {
...
// Linked list (stack) of active _PyCStackRef
struct _PyCStackRef *c_stack_refs;
...
}
static inline void _PyThreadState_PushCStackRef(PyThreadState *tstate, _PyCStackRef *ref) { ... }
static inline void _PyThreadState_PopCStackRef(PyThreadState *tstate, _PyCStackRef *ref) { ... }