-
Notifications
You must be signed in to change notification settings - Fork 707
perf: Optimize cluster calculation with DFS and zero-allocation patterns #2539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Replace BFS with DFS and eliminate GC pressure in calculateClusters() hot path: - Switch from O(N) queue.shift() to O(1) stack.pop() operations - Replace Set.has()/Set.add() with Uint8Array bitfield - Add reusable buffer management to avoid repeated allocations - Implement callback-based neighbor iteration to eliminate array allocations - Add forEachNeighborWithDiag() method to Game interface and GameImpl - Remove now unused GameImpl import from PlayerExecution
WalkthroughAdds a per-game traversal state with generation-tracked, buffered flood-fill refactor for cluster computation in PlayerExecution, and introduces zero-allocation neighbor iteration APIs ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Suggested reviewers
Poem
Pre-merge checks✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (7)📓 Common learnings📚 Learning: 2025-11-29T22:22:37.170ZApplied to files:
📚 Learning: 2025-11-26T22:27:31.808ZApplied to files:
📚 Learning: 2025-10-20T20:15:28.858ZApplied to files:
📚 Learning: 2025-08-28T22:47:31.406ZApplied to files:
📚 Learning: 2025-10-20T11:02:16.969ZApplied to files:
📚 Learning: 2025-10-18T17:54:01.311ZApplied to files:
🧬 Code graph analysis (1)src/core/execution/PlayerExecution.ts (3)
🔇 Additional comments (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/core/execution/PlayerExecution.ts (1)
291-291: Fix comment formatting.Missing space after
//in the comment.- //Use callback to avoid creating a 'neighbors' Array + // Use callback to avoid creating a 'neighbors' Array
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/core/execution/PlayerExecution.ts(2 hunks)src/core/game/Game.ts(1 hunks)src/core/game/GameImpl.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-26T22:27:31.808Z
Learnt from: scamiv
Repo: openfrontio/OpenFrontIO PR: 2519
File: src/core/worker/SharedTileRing.ts:55-63
Timestamp: 2025-11-26T22:27:31.808Z
Learning: In SharedTileRing.ts, the ring buffer is sized to width * height (the map dimensions). Combined with the dirty flag deduplication mechanism (each tile can only be enqueued once until the consumer clears its dirty flag), the number of pending entries is naturally bounded by the map size and drained every render frame. Therefore, ring buffer overflow should be extremely rare or effectively impossible, and any theoretical race condition between producer and consumer modifying the read index would only manifest as redundant tile refs being rendered, not memory corruption, which is acceptable.
<!-- </add_learning>
Applied to files:
src/core/execution/PlayerExecution.ts
🔇 Additional comments (5)
src/core/game/Game.ts (1)
667-673: LGTM! Clean zero-allocation API design.The callback-based pattern is a solid approach for performance-critical code. The documentation clearly explains the motivation.
src/core/execution/PlayerExecution.ts (4)
13-17: LGTM! Proper buffer initialization.Initializing the buffer to an empty
Uint8Array(0)is clean. The actual sizing happens lazily when needed based on map dimensions.
264-274: LGTM! Efficient buffer management.The early return (lines 264-265) and buffer reuse pattern (line 273) are solid optimizations. Using
fill(0)to clear is significantly faster than allocating a newSeton each call.Minor: The null check on line 269 (
!this._visitedBuffer) is unnecessary since the constructor initializes the field toUint8Array(0).
263-305: Well-designed performance optimization.The refactored
calculateClusterssuccessfully achieves the stated goals:
- ✅ Replaces O(N)
queue.shift()with O(1)stack.pop()- ✅ Replaces
Set.has()/Set.add()with Uint8Array bitfield access- ✅ Uses callback-based neighbor iteration to eliminate array allocations
- ✅ Reuses buffer across calls to minimize GC pressure
The DFS algorithm implementation is correct and should deliver significant performance improvements.
279-303: Based on the web search results, I've found a critical connectivity mismatch. The web search confirms that the originalcalculateClustersalgorithm uses four-neighbour connectivity (orthogonal neighbors only), but the current code usesforEachNeighborWithDiagwhich implements 8-connectivity (includes diagonal neighbors).Additionally, the web search hints that there may have been a recent change in the commit history related to this.
Since the repository is currently inaccessible, I'll rewrite the review comment based on the verified findings:
Connectivity approach changed from 4-way to 8-way neighbors — verify this is intentional.
The current implementation uses
forEachNeighborWithDiag()for 8-connectivity (includes diagonal neighbors), but the documented behavior ofcalculateClustersis to compute connected components using four-neighbour connectivity (orthogonal neighbors only). This algorithmic change affects territory clustering, which has downstream implications for troop generation, unit/building validity, and conquest logic. Confirm whether this change to diagonal connectivity was intentional or accidental. If unintentional, revert toforEachNeighbor()for orthogonal neighbors only.Regarding the use of
TileRefas an array index (lines 281, 285, 295, 298): this is a standard pattern in the codebase and appears safe, assumingTileRefrepresents a valid linear tile index in the range[0, width × height). This should be verified during code review or testing to ensure no out-of-bounds access occurs.
| if (!this._visitedBuffer || this._visitedBuffer.length < mapSize) { | ||
| this._visitedBuffer = new Uint8Array(mapSize); | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do the visitedBuffer initialization in the init() method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve removed the buffer from PlayerExecution and replaced it with a Game scoped Uint32Array
| this._visitedBuffer = new Uint8Array(mapSize); | ||
| } else { | ||
| // Fast clear (much faster than creating a new Set) | ||
| this._visitedBuffer.fill(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm little worried about this, we'll be filling a 2 million+ large array for each player. This could negate the gains made?
maybe instead we store player small id in the array so players can reuse the same array (like this: this._visitedBuffer[neighbor] === player.getSmallID()), then we only need to clear the array once each tick?
I think we should run a performance profile, or create a perf test, see AstarPerf.test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve switched to a single Uint32Array visited buffer with a generational counter, so we go from O(players * mapSize) to O(mapSize) and avoid clearing the array.
- Introduce ClusterTraversalState and a WeakMap<Game, ClusterTraversalState> in PlayerExecution.ts to store visited: Uint32Array and gen per game instance. - Remove the per-player _visitedBuffer - Update calculateClusters() to allocate/resize a single Uint32Array(totalTiles) per Game, use a generation counter instead of calling fill(0) to clear. - Switch visited checks to visited[tile] === currentGen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/core/execution/PlayerExecution.ts (1)
269-317: DFS cluster calculation looks correct and keeps allocations lowThe new
calculateClusterslogic matches the intended behavior:
- Early-return on empty
borderTilesavoids unnecessary work.- Buffer sizing is tied to
width * height, and you grow the buffer only when the map grows.- The generational scheme (
gen+Uint32Arrayof tile refs) ensures each tile is visited at most once per call without any O(N) clear.- The stack-based DFS over
borderTilesplusforEachNeighborWithDiagproduces connected components as before, while avoiding neighbor array allocations.If you want to squeeze allocations even further, you could store the
stackarray inClusterTraversalStateand reuse it across calls (just setstack.length = 0at the start ofcalculateClusters()), instead of creating a new array each time. That would be a small, optional win and keeps the code shape similar.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/core/execution/PlayerExecution.ts(2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: scamiv
Repo: openfrontio/OpenFrontIO PR: 2539
File: src/core/game/GameImpl.ts:520-542
Timestamp: 2025-11-29T22:22:37.170Z
Learning: In GameImpl.ts, neighborsWithDiag() and forEachNeighborWithDiag() intentionally duplicate coordinate iteration logic. They serve different purposes: forEachNeighborWithDiag() is a zero-allocation hot-path optimization while neighborsWithDiag() is a convenience method that returns an array. Refactoring one to call the other would add callback/closure allocations and indirection overhead, defeating the performance goals.
📚 Learning: 2025-11-29T22:22:37.170Z
Learnt from: scamiv
Repo: openfrontio/OpenFrontIO PR: 2539
File: src/core/game/GameImpl.ts:520-542
Timestamp: 2025-11-29T22:22:37.170Z
Learning: In GameImpl.ts, neighborsWithDiag() and forEachNeighborWithDiag() intentionally duplicate coordinate iteration logic. They serve different purposes: forEachNeighborWithDiag() is a zero-allocation hot-path optimization while neighborsWithDiag() is a convenience method that returns an array. Refactoring one to call the other would add callback/closure allocations and indirection overhead, defeating the performance goals.
Applied to files:
src/core/execution/PlayerExecution.ts
📚 Learning: 2025-10-20T20:15:28.858Z
Learnt from: sambokai
Repo: openfrontio/OpenFrontIO PR: 2225
File: src/core/execution/FakeHumanExecution.ts:51-51
Timestamp: 2025-10-20T20:15:28.858Z
Learning: In src/core/execution/FakeHumanExecution.ts, game balance constants like MIRV_COOLDOWN_TICKS, MIRV_HESITATION_ODDS, VICTORY_DENIAL_TEAM_THRESHOLD, VICTORY_DENIAL_INDIVIDUAL_THRESHOLD, and STEAMROLL_CITY_GAP_MULTIPLIER are experimental tuning parameters subject to frequent change during balance testing. Do not flag changes to these values as issues or compare them against previous values.
Applied to files:
src/core/execution/PlayerExecution.ts
📚 Learning: 2025-08-28T22:47:31.406Z
Learnt from: BrewedCoffee
Repo: openfrontio/OpenFrontIO PR: 1957
File: tests/core/executions/PlayerExecution.test.ts:16-39
Timestamp: 2025-08-28T22:47:31.406Z
Learning: In test files, PlayerExecution instances must be manually registered with game.addExecution() because the setup utility doesn't automatically register SpawnExecution, which would normally handle this during the spawn phase in a real game.
Applied to files:
src/core/execution/PlayerExecution.ts
📚 Learning: 2025-10-20T11:02:16.969Z
Learnt from: sambokai
Repo: openfrontio/OpenFrontIO PR: 2225
File: src/core/execution/FakeHumanExecution.ts:57-57
Timestamp: 2025-10-20T11:02:16.969Z
Learning: In src/core/execution/FakeHumanExecution.ts, the correct MIRV victory denial thresholds are VICTORY_DENIAL_TEAM_THRESHOLD = 0.8 (80% for team games) and VICTORY_DENIAL_INDIVIDUAL_THRESHOLD = 0.65 (65% for individual players), not 0.85 and 0.7 as might be mentioned in some documentation.
Applied to files:
src/core/execution/PlayerExecution.ts
📚 Learning: 2025-11-26T22:27:31.808Z
Learnt from: scamiv
Repo: openfrontio/OpenFrontIO PR: 2519
File: src/core/worker/SharedTileRing.ts:55-63
Timestamp: 2025-11-26T22:27:31.808Z
Learning: In SharedTileRing.ts, the ring buffer is sized to width * height (the map dimensions). Combined with the dirty flag deduplication mechanism (each tile can only be enqueued once until the consumer clears its dirty flag), the number of pending entries is naturally bounded by the map size and drained every render frame. Therefore, ring buffer overflow should be extremely rare or effectively impossible, and any theoretical race condition between producer and consumer modifying the read index would only manifest as redundant tile refs being rendered, not memory corruption, which is acceptable.
<!-- </add_learning>
Applied to files:
src/core/execution/PlayerExecution.ts
🧬 Code graph analysis (1)
src/core/execution/PlayerExecution.ts (4)
src/core/game/Game.ts (1)
Game(659-752)src/core/game/PlayerImpl.ts (1)
borderTiles(287-289)src/core/game/GameMap.ts (1)
TileRef(3-3)src/core/game/UnitImpl.ts (1)
tile(176-178)
🔇 Additional comments (1)
src/core/execution/PlayerExecution.ts (1)
6-13: Per-game traversal state with generational visited is a good fitUsing a
WeakMap<Game, ClusterTraversalState>plus a singleUint32Arrayper game removes the need for one visited buffer per player, and thegencounter means you avoid clearing the full array on each run. This directly addresses the earlier concern about filling multi-million entry arrays per player without adding complexity. I don’t see functional problems here.
| // Per-game traversal state used by calculateClusters() to avoid per-player buffers. | ||
| const traversalStates = new WeakMap<Game, ClusterTraversalState>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when would we ever have multiple game instances?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could turn it into a static, but this seemed cleaner to ensure isolation in case we ever have different Game instances in the same thread (e.g. unit tests). Want it changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an invariant we'll only have a single game object at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adressed
- Convert neighbor loops in surroundedBySamePlayer, isSurrounded, getCapturingPlayer to forEachNeighbor callbacks - Implement floodFillWithGen() method for configurable zero-allocation flood fill - Replace BFS in removeCluster() with floodFillWithGen using cardinal neighbors - Refactor calculateClusters() to use floodFillWithGen with diagonal neighbors - Add generational state management and forEachNeighbor interface method
…red traversal state variable. - Update traversal state management
Description:
Replace BFS with DFS and eliminate GC pressure in calculateClusters():
Describe the PR.
Please complete the following:
Please put your Discord username so you can be contacted if a bug or regression is found:
DISCORD_USERNAME