GH-135379: Top of stack caching for the JIT. #135465

markshannon · 2025-06-13T13:11:56Z

The stats need fixing and the generated tables could be more compact, but it works.

Issue: Top-of-stack caching in the JIT #135379

Fidget-Spinner

This is really cool. I'll do a full review soon enough.

Python/optimizer.c

markshannon · 2025-06-20T13:15:39Z

Performance is in the noise, but we would need a really big speed up of jitted code for it to be more than noise overall.

The nbody benchmark, which spends a lot of time in the JIT shows a 13-18% speedup, except on Mac where it shows no speedup.
I don't know why that would be as I think we are using stock LLVM for Mac, not the Apple compiler.

Fidget-Spinner · 2025-06-20T13:21:10Z

The nbody benchmark, which spends a lot of time in the JIT shows a 13-18% speedup, except on Mac where it shows no speedup. I don't know why that would be as I think we are using stock LLVM for Mac, not the Apple compiler.

Nice. We use Apple's Compiler for the interpreter, though the JIT uses stock LLVm. Thomas previously showed that the version of the Apple compiler we use is subject to huge fluctuations in performance due to a PGO bug.

Misc/NEWS.d/next/Core_and_Builtins/2025-06-20-16-03-59.gh-issue-135379.eDg89T.rst

Fidget-Spinner

I need to review the cases generator later.

Misc/NEWS.d/next/Core_and_Builtins/2025-06-13-13-32-16.gh-issue-135379.pAxZgy.rst

Python/optimizer.c

Tools/cases_generator/analyzer.py

colesbury

One comment below. The _LOAD_ATTR changes look fine to me.

Python/bytecodes.c

markshannon · 2025-08-13T18:24:06Z

Latest bechmarking results: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20250813-3.15.0a0-6a85f95-JIT

2.6% faster on Windows. No change on Linux.

It looks like coverage is slower on linux, which is presumably some sort of artifact as the coverage benchmark does lots of instrumentation which prevents the JIT form running (Plus the coverage benchmark uses an old version of coverage, the latest version is much faster).

Fidget-Spinner · 2025-08-22T07:46:07Z

The tail calling CI seems to be failing because homebrew changed where they install clang (yet again). Will put up a separate PR to fix that.

Fidget-Spinner · 2025-08-22T12:16:46Z

Ok, I fixed the macOS CI on main. Please pull the changes in.

markshannon · 2025-08-22T17:19:03Z

I thought that caching through side exits would speed things up, but it looks like it slows things down a bit if anything.
https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20250822-3.15.0a0-efe4628-JIT

So, I've reverted that change.

Will rerun the benchmarks, to confirm...

markshannon · 2025-10-19T08:05:18Z

I was hoping to get a clear cross-the-board speedup before merging this.
But as it also enables decref removal and fixes the problem of intermediate values on the stack during tracing (#140277) I think we should merge this soon and tweak it later.

savannahostrowski

Took a quick skim of this - very neat! Thanks for also including all the perf numbers in the discussion, which helps counteract some of the initial trepidation I had around the amount of new generated code. This lays pretty solid groundwork for future optimizations as well.

Just one comment about the type change in pycore_optimizer.h

savannahostrowski · 2025-11-03T19:07:48Z

Include/internal/pycore_optimizer.h

    uint8_t chain_depth:6;  // Must be big enough for MAX_CHAIN_DEPTH - 1.
    bool warm;
-    int index;           // Index of ENTER_EXECUTOR (if code isn't NULL, below).
+    int16_t index;           // Index of ENTER_EXECUTOR (if code isn't NULL, below).


Was this intentional? In practice, functions are very small and should probably never exceed this, but do we want a check here or some validation? I think this could overflow silently, right?

Fidget-Spinner · 2025-11-09T22:12:01Z

Tools/cases_generator/analyzer.py


+#Simple heuristic for size to avoid too much stencil duplication
+def is_large(uop: Uop) -> bool:
+    return len(list(uop.body.tokens())) > 80


This is too low, to be useful. For example, _BINARY_OP_ADD_FLOAT is 89 tokens. It should be register allocated, but instead it spills all the time because it's limited to r2_1. I've inspected the code for nbody and it spills unecessarily. We should up the limit.

I suggest increasing the limit to 150.

Some benchmark results for nbody tracing jit vs nbody tracing jit + regalloc after increasing the limit. Base commits are a few apart but they have no JIT related changes in them:

Mean +- std dev: [nbody_tracing] 107 ms +- 2 ms -> [nbody_tracing_regalloc] 78.8 ms +- 0.6 ms: 1.36x faster

36% faster. That's significantly higher than the 17-27% we've been getting.

bedevere-app · 2025-11-09T22:12:10Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Fidget-Spinner · 2025-11-10T00:46:05Z

Once we get Savannah's LLVM 21 PR in, we should experiment with setting the TOS cache size to 4. I observe a lot of spilling due to the loop iterator taking up some space.

markshannon · 2025-11-12T13:23:45Z

I think we will want to vary the cache depending on both hardware and operating system.
We will want to vary it for different hardware, due to differing numbers of available registers.
We will want to vary for operating system, as different OSes have different calling conventions, changing the number of available registers.

All that for later PR(s) though.

markshannon added 6 commits June 12, 2025 14:19

Tier 2 TOS caching. Work in progress

579b758

Tier 2 TOS caching, working for interpreter.

489e510

Get JIT working

f603929

Fix tool to support 3.11

cf1d7ab

Add news

efd7a0a

int arithmetic doesn't escape

bb4e6b9

bedevere-app bot mentioned this pull request Jun 13, 2025

Top-of-stack caching in the JIT #135379

Open

Fidget-Spinner reviewed Jun 13, 2025

View reviewed changes

Python/optimizer.c Outdated Show resolved Hide resolved

markshannon added 10 commits June 13, 2025 14:48

Repair stats

e976b9b

Add missing type annotation

11de93e

Pacify mypy

4698695

Add type annotation

33837a7

Avoid overflow gathering stats

8bb12ef

Reduce spilling

920e6de

Merge branch 'main' into tier-2-tos-caching

45e1abd

Merge branch 'main' into tier-2-tos-caching

3d72871

Merge branch 'main' into tier-2-tos-caching

0240115

Improve heuristics for stack caching

2850d72

markshannon force-pushed the tier-2-tos-caching branch from 78489ea to 2850d72 Compare June 19, 2025 14:49

Merge branch 'main' into tier-2-tos-caching

1c291f1

Add news

ba2331a

markshannon marked this pull request as ready for review June 20, 2025 15:04

markshannon requested review from brandtbucher and savannahostrowski as code owners June 20, 2025 15:04

bedevere-app bot added the awaiting core review label Jun 20, 2025

Fidget-Spinner reviewed Jun 20, 2025

View reviewed changes

Misc/NEWS.d/next/Core_and_Builtins/2025-06-20-16-03-59.gh-issue-135379.eDg89T.rst Outdated Show resolved Hide resolved

Fidget-Spinner reviewed Jun 20, 2025

View reviewed changes

Fix uop execution stats

cbee8d2

More missing edits

6a85f95

colesbury reviewed Aug 13, 2025

View reviewed changes

Python/bytecodes.c Outdated Show resolved Hide resolved

markshannon added 2 commits August 13, 2025 20:47

Add _PyThreadState_PopCStackRefSteal as suggested by Sam

0327b8b

Allow 'peek' variables to be cached in registers even if uop escapes

0995ea9

Fidget-Spinner approved these changes Aug 14, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting changes labels Aug 14, 2025

markshannon added 3 commits August 21, 2025 10:49

Merge branch 'main' into tier-2-tos-caching

0b1ed44

Cache TOS across side exits: Work in progress

e350669

Post merge fixups

7830f69

markshannon requested review from ZeroIntensity and ericsnowcurrently as code owners August 22, 2025 07:28

Add missing type annotation

efe4628

Revert TOS caching through side exits

492ecda

markshannon added 3 commits August 22, 2025 18:19

Merge branch 'main' into tier-2-tos-caching

dc660a2

Merge branch 'main' into tier-2-tos-caching

a1b36b3

Merge branch 'main' into tier-2-tos-caching

7d47f13

markshannon mentioned this pull request Oct 19, 2025

BINARY_OP_SUBSCR_GETITEM constituent uops are invalid #140277

Open

savannahostrowski approved these changes Nov 3, 2025

View reviewed changes

Fidget-Spinner requested changes Nov 9, 2025

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting merge labels Nov 9, 2025

Uh oh!

GH-135379: Top of stack caching for the JIT. #135465

Are you sure you want to change the base?

GH-135379: Top of stack caching for the JIT. #135465

Uh oh!

Conversation

markshannon commented Jun 13, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markshannon commented Jun 20, 2025

Uh oh!

Fidget-Spinner commented Jun 20, 2025

Uh oh!

Uh oh!

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markshannon commented Aug 13, 2025

Uh oh!

Fidget-Spinner commented Aug 22, 2025

Uh oh!

Fidget-Spinner commented Aug 22, 2025

Uh oh!

markshannon commented Aug 22, 2025

Uh oh!

markshannon commented Oct 19, 2025

Uh oh!

savannahostrowski left a comment

Choose a reason for hiding this comment

Uh oh!

savannahostrowski Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-app bot commented Nov 9, 2025

Uh oh!

Fidget-Spinner commented Nov 10, 2025

Uh oh!

markshannon commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

markshannon commented Jun 13, 2025 •

edited by bedevere-app bot

Loading

savannahostrowski Nov 3, 2025 •

edited

Loading

Fidget-Spinner Nov 9, 2025 •

edited

Loading

Fidget-Spinner Nov 9, 2025 •

edited

Loading

markshannon commented Nov 12, 2025 •

edited

Loading