So part of the reason I think why lazy basic block versioning is so efficient are the following 2 reasons (my guesses, not empirically verified):
- The exit stubs rewrite themselves to efficient short jumps to other basic blocks when they're triggered.
- The jitted code is allocated in a contiguous block.
Doing 1 for _EXIT_TRACE and _COLD_EXIT is a little more complicated and may cause security vulnerabilities, as such, let's just hope that the branch predictor can do a good job for a jump that is always to the same location.
We should try out 2. to make jit code sit near each other for locality reasons. This may have some speedups.