fix(gas): fix loop forgotten bug due to concurrency #3482

MasterPtato · 2025-11-18T23:50:26Z

No description provided.

vercel · 2025-11-18T23:50:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
rivetkit-serverless	Error			Dec 2, 2025 7:11pm

3 Skipped Deployments

Project	Deployment	Preview	Updated (UTC)
rivet-cloud	Ignored		Dec 2, 2025 7:11pm
rivet-inspector	Ignored	Preview	Dec 2, 2025 7:11pm
rivet-site	Ignored	Preview	Dec 2, 2025 7:11pm

MasterPtato · 2025-11-18T23:50:37Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

pkg-pr-new · 2025-11-18T23:52:47Z

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3482

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3482

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3482

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3482

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3482

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3482

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3482

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3482

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3482

commit: 1a4f15a

claude · 2025-11-18T23:56:26Z

PR Review: fix(gas): fix loop forgotten bug due to concurrency

Summary

This PR fixes a critical concurrency bug in the gasoline workflow engine where loop event history was being incorrectly cleared, potentially losing events from concurrent iterations.

Core Fix Analysis (engine/packages/gasoline/src/db/kv/mod.rs)

The Problem:
The original code used clear_subspace_range(&loop_events_subspace) which cleared all events under the loop location, regardless of iteration. In concurrent scenarios where multiple loop iterations could be in flight, this could inadvertently delete events from the current or future iterations.

The Solution:
The fix correctly bounds the range to only clear events from iterations 0 through iteration - 1:

Start: {loop location, 0, ...}
End: {loop location, iteration - 1, ...}

This ensures only past iterations are moved to forgotten history, preserving the current iteration's events.

Code Quality: ✅ Excellent

Clear variable names (loop_events_subspace_start, loop_events_subspace_end)
Helpful comments explaining the range bounds
Uses saturating_sub(1) to safely handle iteration 0 edge case
Matches the same pattern used for cleaning old forgotten events (lines 2638-2655)

Other Changes Review

1. pegboard-runner/src/conn.rs - Workflow migration

-workflow(pegboard::workflows::runner::Input {
+workflow(pegboard::workflows::runner2::Input {

Concern: ⚠️ This appears to be switching from runner to runner2 workflow.

Questions:

Is this migration intentional or accidentally included?
Are there backward compatibility implications?
Should there be a deprecation path for the old workflow?

2. pegboard/src/workflows/runner2.rs - Init signal handling

-if init.is_none() {
-    init = Some(init_sig);
-    check_queue = true;
-}
+init = Some(init_sig);
+check_queue = true;

Concern: ⚠️ This removes the guard that prevented overwriting an existing init value.

Potential Issue:

If ToServerInit can be received multiple times, this will now blindly overwrite previous init data
Could this cause race conditions or lost initialization state?

Recommendation: If this change is intentional (e.g., supporting re-initialization), add a comment explaining why. Otherwise, consider keeping the guard or at least logging when init is being overwritten.

3. Cargo.toml - Commented release profile options

+# strip = true
+# panic = "abort"
+# overflow-checks = false
+# debug-assertions = false

Issue: ❌ These commented-out options should not be in the PR

Recommendation: Remove these lines. They appear to be experimental/debugging configurations that were accidentally included.

4. scripts/docker/build-push.sh - Commented build arg

+# --build-arg CARGO_BUILD_MODE=release \

Issue: ❌ Commented code in scripts

Recommendation: Remove if not needed, or uncomment if intended to be active.

5. scripts/run/docker/engine-rocksdb.sh - Trailing newline

Issue: ⚠️ Minor - removes trailing newline

This is inconsequential but typically files should end with a newline per POSIX standards.

Security Considerations

✅ No security concerns identified. The fix actually improves data integrity by preventing unintended data loss.

Performance Considerations

✅ Improved Performance: The bounded range query is more efficient than the original unbounded subspace range, as it scans fewer keys.

Test Coverage

⚠️ Missing Tests

Concerns:

No test coverage for the concurrent loop iteration scenario
Existing loop_test.rs only tests basic sequential loop functionality
The bug being fixed (concurrent iteration event deletion) is not covered by any test

Recommendations:

Add a test that:
- Runs a loop with slow iterations
- Triggers concurrent signals/events during loop execution
- Verifies that events from all iterations are preserved correctly
Add a test specifically for the forgotten history bounds to ensure iteration N-1 events are moved but iteration N events remain

Example test structure:

#[workflow(ConcurrentLoopTestWorkflow)]
pub async fn concurrent_loop_test(ctx: &mut WorkflowCtx, input: &Input) -> Result<()> {
    ctx.loope(0, move |ctx, state| {
        async move {
            // Trigger some events during iteration
            ctx.activity(...).await?;
            
            if *state >= 5 {
                return Ok(Loop::Break(()));
            }
            *state += 1;
            Ok(Loop::Continue)
        }.boxed()
    }).await?;
    
    // Verify all iteration events are in forgotten history
    Ok(())
}

Summary of Recommendations

Must Fix:

❌ Remove commented Cargo.toml release profile options
❌ Remove or uncomment the Docker build script changes

Should Address:

⚠️ Clarify the runner → runner2 migration intent
⚠️ Review the init signal handling change - document or revert
⚠️ Add test coverage for concurrent loop scenarios

Nice to Have:

Add trailing newline to engine-rocksdb.sh

Overall Assessment

The core bug fix is excellent and addresses a real concurrency issue with a correct, performant solution. However, the PR includes several unrelated or questionable changes that should be addressed before merging.

Recommendation: Request changes to clean up the non-essential modifications and clarify the intentional behavior changes in runner2.rs.

claude · 2025-11-19T22:09:58Z

Code Review

Summary

This PR fixes a critical concurrency bug in the workflow loop history management where events from all loop iterations were being moved to forgotten history, not just events from completed iterations. The fix correctly bounds the range query to only include events up to the current iteration.

✅ Positives

Critical Bug Fix: The main change correctly addresses the concurrency issue by:
- Changing from a full subspace query to a bounded range query
- Using iteration.saturating_sub(1) as the upper bound to exclude the current iteration
- Properly clearing only the bounded range instead of the entire subspace
Clear Documentation: The comments clearly explain the intent ("Move all events under this loop up to the current iteration to the forgotten history")
Safe Arithmetic: Using saturating_sub(1) prevents underflow if iteration is 0
Consistent Signal Handling: The runner2 workflow fix allows re-initialization signals to be processed correctly by removing the guard condition

🔍 Issues & Concerns

Critical: Missing Key Deletion (`engine/packages/gasoline/src/db/kv/mod.rs:2631`)

The current code moves entries to forgotten history but doesn't delete the original keys:

tx.set(&forgotten_key, entry.value());
// Missing: tx.clear(entry.key());

Impact: This will leave duplicate data in the active history, potentially causing:

Incorrect workflow replay behavior
Wasted storage space
Query performance degradation

Recommendation: Add tx.clear(entry.key()); after line 2631.

Medium: Potential Off-by-One Issue (`engine/packages/gasoline/src/db/kv/mod.rs:2598`)

Using iteration.saturating_sub(1) as the exclusive upper bound means events from iteration N-1 will be included in the range. This may be correct, but needs verification:

If the current iteration is 5, the range will include iterations 0-4
Is this the intended behavior, or should it exclude iteration 4 (the immediately previous iteration)?

Recommendation: Verify with the workflow loop semantics whether the previous iteration's events should be moved to forgotten history or remain active.

Low: Commented Out Production Code (`Cargo.toml:437-440`, `scripts/docker/build-push.sh:29`)

Multiple production configuration options are commented out:

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false

# --build-arg CARGO_BUILD_MODE=release \

Impact:

These look like debugging/development changes that shouldn't be committed
May indicate the PR includes unintended changes
Could confuse future developers about why these are commented

Recommendation: Remove these commented lines or document why they're included.

Low: Unnecessary File Change (`scripts/run/docker/engine-rocksdb.sh:16`)

The removal of a trailing newline appears unrelated to the bug fix.

Recommendation: Revert this cosmetic change to keep the PR focused.

🧪 Testing Concerns

No Test Coverage Evident: The PR doesn't appear to include:

Unit tests for the bounded range query logic
Integration tests verifying the loop iteration boundary behavior
Tests ensuring events from the current iteration remain in active history

Recommendation: Add tests that verify:

Events from completed iterations are moved to forgotten history
Events from the current iteration remain in active history
The edge case where iteration = 0 is handled correctly
Concurrent workflow loops don't interfere with each other

🔒 Security Considerations

No security vulnerabilities identified. The use of saturating_sub prevents integer underflow.

⚡ Performance Considerations

Positive: The bounded range query is more efficient than querying the entire subspace, especially for long-running loops with many iterations.

Note: The loop that moves entries to forgotten history processes them sequentially, which is appropriate for maintaining consistency.

📝 Style & Conventions

According to CLAUDE.md:

✅ Uses structured logging with tracing
✅ Comments are lowercase
✅ Uses anyhow Result types correctly
✅ Follows existing code patterns

🎯 Recommendations Summary

MUST FIX: Add tx.clear(entry.key()); after moving entries to forgotten history
SHOULD VERIFY: Confirm the off-by-one semantics for iteration boundaries
SHOULD REMOVE: Commented out code in Cargo.toml and build scripts
SHOULD ADD: Test coverage for the loop iteration boundary logic

Overall Assessment

The core bug fix is sound and addresses a real concurrency issue, but the missing key deletion is a critical oversight that must be addressed before merging. Once that's fixed and the commented code is cleaned up, this will be a solid improvement to the workflow history management system.

claude · 2025-11-19T22:10:14Z

PR Review: Fix loop forgotten bug due to concurrency

Summary

This PR addresses a critical bug in the workflow engine's loop event history management. The fix ensures that when moving loop events to forgotten history, only events from iterations 0 through (current iteration - 1) are moved, preventing concurrent loop iterations from incorrectly archiving each other's events.

Code Quality & Best Practices ✅

Main Fix (gasoline/src/db/kv/mod.rs:2579-2634)

Good:

The fix correctly calculates precise range boundaries using saturating_sub(1) to prevent underflow
Clear, descriptive comments explain the intent ("Start is {loop location, 0, ...}" and "End is {loop location, iteration - 1, ...}")
Properly uses .range().0 and .range().1 to extract start/end boundaries from the subspace

Observation:

The use of saturating_sub(1) on line 2598 is defensive programming for iteration == 0, though logically this function shouldn't be called with iteration == 0. This is good defensive coding.

Workflow Fix (pegboard/src/workflows/runner2.rs:103-108)

Good:

Simplifies the initialization logic by removing the unnecessary if init.is_none() guard
The change allows re-initialization, which makes sense for handling reconnections or state recovery

Potential Issue:
The removal of the if init.is_none() check means init can be overwritten multiple times if multiple ToServerInit signals arrive in the same batch.

Question: Is this the intended behavior? Consider these scenarios:

If multiple init packets arrive (network retransmission, reconnection), should we always take the latest one?
Could this mask bugs where multiple init packets indicate an unexpected state?

Suggestion: Consider either:

Adding a warning log when overwriting an existing init: if init.is_some() { tracing::warn!("overwriting existing init signal"); }
Or documenting why multiple inits are expected and safe

Potential Bugs & Issues

1. Edge Case: iteration == 0 ⚠️

When iteration == 0, we have:

loop_events_subspace_start: points to iteration 0
loop_events_subspace_end: points to iteration 0 (via saturating_sub(1))

This creates a range where start >= end. Need to verify that clear_range handles this gracefully (likely a no-op, which is correct).

2. Missing Entry Deletion ❓

On line 2631, entries are copied to forgotten_history_subspace but the original entries are only deleted via the clear_range at line 2634. Verify that if an error occurs between copying (2631) and clearing (2634), we don't leave duplicate entries or partial state.

Recommendation: Consider transaction semantics - ensure this entire operation is atomic.

Performance Considerations

Range Query Optimization ✅

The fix significantly improves performance by limiting the range query to only the relevant iterations instead of querying all events under the loop location. This prevents O(total iterations) work on each loop update.

Stream Processing ✅

The existing streaming approach (get_ranges_keyvalues with StreamingMode::WantAll) is appropriate for handling potentially large event sets.

Security Concerns

No Major Issues ✅

The changes don't introduce security vulnerabilities. The range calculation is safe from integer overflow due to saturating_sub.

Test Coverage

Missing Tests ⚠️

This PR doesn't include tests. Given the bug was described as "forgotten due to concurrency", recommend adding:

Concurrency test: Multiple concurrent loop iterations to verify isolation
Boundary test: Loop with iteration == 0, iteration == 1
Forgotten history test: Verify events are moved correctly and old events (> 100 iterations) are pruned

Example test scenario:

// Pseudo-code
#[test]
async fn test_concurrent_loop_iterations_dont_interfere() {
    // Create workflow with loop at iteration 5
    // Concurrently:
    //   - Thread A updates loop to iteration 6
    //   - Thread B updates loop to iteration 6
    // Verify: Events from iterations 0-5 are in forgotten
    // Verify: Events from iteration 6 are in active
    // Verify: No events are lost or duplicated
}

Additional Changes Review

Cargo.toml

The commented-out release optimizations appear to be development/debugging leftovers:

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false

Recommendation: Remove these commented lines or explain why they're being kept for future reference.

scripts/docker/build-push.sh

Similarly, commented build arg:

# --build-arg CARGO_BUILD_MODE=release \

Recommendation: Clean up commented code or document the intent.

scripts/run/docker/engine-rocksdb.sh

Removed trailing newline - this is fine, just a formatting cleanup.

Overall Assessment

Status: ✅ Approve with minor recommendations

The core fix correctly addresses the concurrency bug by precisely limiting the range of events moved to forgotten history. The logic is sound and the implementation properly uses the database range primitives.

Required before merge:

Add tests for concurrent loop iterations
Verify behavior when iteration == 0
Clean up commented code in Cargo.toml and build scripts

Recommended:

Add logging when init is overwritten in runner2.rs
Document expected behavior for multiple ToServerInit signals

Great work identifying and fixing this subtle concurrency issue!

claude · 2025-11-20T03:12:02Z

Code Review

Summary

This PR fixes a critical bug in the workflow loop event history management where events from all iterations (including future ones) were being moved to forgotten history instead of only past iterations. The fix also includes some related cleanup in the pegboard runner workflow.

Critical Bug Fix ✅

Location: engine/packages/gasoline/src/db/kv/mod.rs:2579-2634

The Problem:
The original code used EventHistorySubspaceKey::entire() which creates a range that includes all iterations of a loop (0 to infinity). This meant that when moving events to forgotten history, it would incorrectly include events from the current and future iterations due to concurrent workflow execution.

The Solution:
The fix correctly bounds the range to only include iterations 0 through iteration - 1:

Start: {loop location, 0, ...} (iteration 0)
End: {loop location, iteration - 1, ...} (one before current)

This ensures only past iterations are moved to forgotten history, which is the correct behavior.

Code Quality: ⭐⭐⭐⭐⭐

Uses saturating_sub(1) to safely handle edge case when iteration = 0
Clear, descriptive comments explain the range boundaries
Proper use of .range() tuple extraction for start/end bounds

Secondary Changes

1. Pegboard Runner Init Logic (`runner2.rs:103-108`)

Change: Removed the if init.is_none() guard when handling ToServerInit signals.

Analysis:

Before: Only accepted the first init signal
After: Accepts and overwrites with any new init signal

Potential Concern: ⚠️
Is it expected/safe for a runner to re-initialize multiple times? If this is intentional (e.g., for reconnection scenarios), this is fine. However, if ToServerInit should only be sent once per runner lifecycle, this could mask bugs where duplicate init signals are sent.

Recommendation: Consider adding a log message when overwriting an existing init:

if init.is_some() {
    tracing::warn!("received duplicate ToServerInit signal");
}
init = Some(init_sig);
check_queue = true;

2. Workflow Reference Update (`conn.rs:128`)

Change: pegboard::workflows::runner::Input → pegboard::workflows::runner2::Input

Analysis: This appears to be migrating to a new runner2 workflow implementation. Straightforward refactor.

Question: Is the old runner workflow being phased out? If so, consider:

Adding a deprecation comment to the old workflow
Ensuring all callsites are updated in this or future PRs

3. Build Script Changes (`scripts/docker/build-push.sh:26-30`)

Changes:

Multi-line formatting for readability
Commented-out --build-arg CARGO_BUILD_MODE=release

Analysis: The commented-out release mode arg suggests this might be work-in-progress or debugging configuration. The formatting improvement is good.

Performance Considerations

Range Query Optimization: ✅
The fix changes from a subspace range query to an explicit bounded range query. This is more efficient because:

Avoids scanning events from current/future iterations
Database can better optimize with explicit bounds
Reduces unnecessary data transfer and processing

Database Operations:

tx.clear_range() instead of tx.clear_subspace_range() - both are appropriate, but the explicit range version is more precise given the bounded range

Security Considerations

No security concerns identified. The changes are internal workflow state management operations.

Test Coverage

Recommendation: ⚠️
This bug involved concurrency issues (events from concurrent iterations being incorrectly processed). Consider adding:

Unit test that verifies the range bounds:
- Create a workflow with loop iterations 0, 1, 2, 3
- Trigger forgotten history migration at iteration 3
- Verify only iterations 0-2 are moved to forgotten history
- Verify iteration 3 remains in active history
Integration test for concurrent execution:
- Simulate concurrent loop iterations
- Verify forgotten history is correctly isolated per iteration

Additional Issues Found

Cargo.toml (`Cargo.toml:436-440`)

Issue: Commented-out release profile configurations:

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false

Recommendation: ❌
These should either be:

Removed entirely if not needed
Uncommented and committed if they're intended configuration
Moved to a separate .cargo/config.toml if they're developer-specific

Leaving commented-out configuration in version control creates confusion about intended settings. According to CLAUDE.md, commented code/config should be avoided.

Script Newline Issue (`scripts/run/docker/engine-rocksdb.sh:15`)

Issue: Removed trailing newline at EOF

Recommendation: Minor - POSIX compliance suggests files should end with a newline. This is a trivial change but consider preserving it.

Overall Assessment

Code Quality: ⭐⭐⭐⭐ (4/5)
Bug Fix Correctness: ⭐⭐⭐⭐⭐ (5/5)
Test Coverage: ⭐⭐⭐ (3/5 - needs concurrent test)

Approval Status: ✅ APPROVE (with minor suggestions)

The core bug fix is excellent and addresses a real concurrency issue correctly. The secondary changes are reasonable but could use clarification on the init signal handling behavior. Clean up the commented-out Cargo.toml changes before merging.

Action Items Before Merge

Must Fix:
- Remove or uncomment the Cargo.toml release profile settings
Should Consider:
- Add logging for duplicate ToServerInit signals
- Add test coverage for the range boundary fix
- Clarify the commented-out Docker build arg
Nice to Have:
- Document the migration from runner to runner2 workflow

claude · 2025-11-20T03:12:17Z

PR Review: Fix loop forgotten bug due to concurrency

Overview

This PR fixes a critical bug in the workflow engine's loop history management where concurrent operations could cause events from ALL iterations of a loop to be moved to forgotten history, rather than just the events from completed iterations up to the current one.

Code Quality & Implementation

Main Fix (gasoline/src/db/kv/mod.rs) ✅

The core fix correctly addresses the concurrency issue by:

Precise Range Queries: Changed from using a subspace (which would match ALL iterations) to explicit start/end ranges that only target iterations 0 through iteration - 1.
- Start: {loop location, 0, ...}
- End: {loop location, iteration - 1, ...}
Proper Range Clear: Uses clear_range() with explicit bounds instead of clear_subspace_range(), ensuring only the intended range is cleared.
Edge Case Handling: Uses saturating_sub(1) to safely handle iteration = 0 case without underflow.

The comments are also improved, making the intent clearer ("up to the current iteration" instead of just "current events").

Potential Issues & Concerns

1. Minor: Iteration = 0 Edge Case

When iteration = 0, the end range becomes:

iteration.saturating_sub(1) // = 0 when iteration is 0

This means the range would be {location, 0} ... {location, 0}. Depending on how range() works, this might:

Return an empty range (correct behavior)
Include iteration 0 (incorrect - would try to forget the current iteration's events)

Recommendation: Add a guard condition:

if iteration > 0 {
    // Only move events to forgotten if we have completed iterations
    let loop_events_subspace_start = ...
    // ... rest of the code
}

2. Configuration Changes (Cargo.toml)

The commented-out release profile optimizations are concerning:

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false

These should either be removed entirely or have a clear comment explaining why they're included but commented
Leaving them suggests they might have been used for debugging this issue
overflow-checks = false is particularly dangerous given the .saturating_sub(1) usage in the fix

Recommendation: Remove these lines or add a comment explaining their purpose.

3. Unrelated Changes

Several changes appear unrelated to the loop forgotten bug:

pegboard-runner/src/conn.rs: Changed from runner::Input to runner2::Input
pegboard/src/workflows/runner2.rs: Removed conditional check for init.is_none(), now always replaces init
scripts/docker/build-push.sh: Added commented CARGO_BUILD_MODE argument
scripts/run/docker/engine-rocksdb.sh: Removed trailing newline

Recommendation: Consider splitting these changes into separate commits or PRs for clearer history. If they're related to reproducing/testing the bug, please document that relationship.

Performance Considerations

✅ Good: The fix is more efficient than the original - it uses precise range bounds instead of iterating over all loop events.

Security Concerns

⚠️ Medium Priority: The iteration handling needs careful review:

Integer underflow is handled with saturating_sub
However, the behavior at iteration = 0 should be explicitly tested
Consider adding debug assertions or tracing to validate the range is correct

Test Coverage

❌ Missing: No test changes are included in this PR.

Recommendations:

Add a test that creates a loop with multiple iterations and verifies:
- Only completed iterations' events are moved to forgotten history
- Current iteration's events remain in active history
- Concurrent updates don't cause data loss
Add a test for the iteration = 0 edge case
Test the "last 100 events" retention logic (lines 2636-2656)

Additional Observations

Logging: Consider adding structured logging around the forgotten history logic:

tracing::debug!(
    ?from_workflow_id,
    ?location, 
    iteration,
    "moving loop events to forgotten history"
);

This would help diagnose similar issues in the future.

Summary

Strengths:

Correctly identifies and fixes the core concurrency bug
More efficient implementation
Better comments

Action Items:

🔴 Required: Verify iteration = 0 behavior or add guard condition
🔴 Required: Add test coverage for this bug fix
🟡 Recommended: Remove or document commented Cargo.toml changes
🟡 Recommended: Separate unrelated changes into different commits/PRs
🟡 Recommended: Add debug logging for troubleshooting

Overall, this is a good fix for a critical bug, but needs test coverage before merging.

claude · 2025-11-26T02:19:04Z

PR Review: fix(gas): fix loop forgotten bug due to concurrency

Summary

This PR addresses a critical bug in the gasoline workflow engine's loop event handling where concurrent loop iterations could cause events from the current iteration to be incorrectly moved to forgotten history.

Code Quality & Best Practices

✅ Strengths

Correct Range Construction - The fix properly constructs bounded ranges to limit which events get archived:
- loop_events_subspace_start: Points to iteration 0
- loop_events_subspace_end: Points to iteration - 1
- This ensures only events from previous iterations are moved to forgotten history, not the current one
Good Use of saturating_sub - Using saturating_sub(1) prevents underflow when iteration = 0, though this case is already handled by the if iteration == 0 branch
Improved Comments - Comments clearly explain the intent:
- "Move all events under this loop up to the current iteration to the forgotten history"
- Clear start/end range documentation

⚠️ Areas for Improvement

1. Inconsistent Workflow Reference Path (`engine/packages/pegboard-runner/src/conn.rs:131`)

.workflow(pegboard::workflows::runner2::Input {

This change from pegboard::workflows::runner::Input to pegboard::workflows::runner2::Input appears to be switching to a new runner implementation. However:

Is this intentional for this PR? The commit message mentions a gasoline bug fix, not a runner migration
Are both implementations compatible? Need to verify this doesn't break existing runners
Should this be a separate PR? Mixing feature changes with bug fixes makes rollback harder

Recommendation: If runner2 is the correct implementation, ensure:

This is documented in the PR description
All callers have been updated
The old runner workflow can be safely deprecated

2. Init Signal Handling Change (`engine/packages/pegboard/src/workflows/runner2.rs:106-108`)

The change removes the if init.is_none() guard:

// BEFORE
if init.is_none() {
    init = Some(init_sig);
    check_queue = true;
}

// AFTER  
init = Some(init_sig);
check_queue = true;

Concerns:

Lost Idempotency: Previously, multiple init signals were ignored. Now they'll overwrite each other
Intentional Behavior?: Is it valid for a runner to receive multiple init signals?
Race Conditions: Could concurrent init signals cause issues?

Recommendation:

Add a comment explaining why reinit is now allowed
Or add a warning log when overwriting existing init: if init.is_some() { tracing::warn!("overwriting existing init signal"); }
Consider if this could mask bugs where duplicate inits shouldn't happen

3. Commented-Out Build Configurations

In Cargo.toml and scripts/docker/build-push.sh, several release optimizations are commented out:

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false

And in the build script:

# --build-arg CARGO_BUILD_MODE=release \

Questions:

Why are these commented instead of removed? Suggests they're being tested or are WIP
Should these be in this PR? They don't relate to the loop forgotten bug fix
Performance Impact: Commented optimizations could affect production builds

Recommendation:

If these are exploratory changes, move to a separate PR
If keeping, add comments explaining why they're disabled
Consider if this is affecting the bug fix testing

Potential Bugs & Issues

🔴 Critical: Missing Delete Operation (`engine/packages/gasoline/src/db/kv/mod.rs:2628`)

In the loop that moves events to forgotten history:

loop {
    let Some(entry) = stream.try_next().await? else {
        break;
    };
    
    // ... validation ...
    
    let forgotten_key = [forgotten_history_subspace.bytes(), truncated_key].concat();
    tx.set(&forgotten_key, entry.value());
    // ❌ MISSING: tx.delete(entry.key())
}

tx.clear_range(&loop_events_subspace_start, &loop_events_subspace_end);

Analysis:
The code iterates through events, copies them to forgotten history with tx.set, but doesn't delete the originals in the loop. It relies on tx.clear_range afterwards to delete them.

Is this a bug?

If clear_range is called after the loop completes, this is correct (batch delete is more efficient)
However, if the transaction fails/retries between the copy and the clear, this could cause:
- ✅ Safe: Keys exist in both places temporarily (duplication, not data loss)
- ⚠️ Inefficient: Transaction retries might re-copy already copied data

Recommendation:

Current approach is acceptable if transactions are atomic
Consider adding a comment: // Delete all copied events in batch (more efficient than per-key deletion)
Verify that clear_range properly executes even if the transaction retries

⚠️ Edge Case: What if `iteration = 0`?

When iteration = 0, the code takes a different path (keys::history::insert::loop_event), so the range-based archiving doesn't run. This is correct, but worth documenting.

Recommendation: Add a comment to clarify:

if iteration == 0 {
    // First iteration: no previous events to archive
    keys::history::insert::loop_event(/*...*/)?;
} else {
    // Archive events from iterations 0 to iteration-1
    // ...
}

Performance Considerations

✅ Improved Efficiency

The change from subspace() (which would scan all loop iterations) to explicit range() with start/end boundaries is a significant performance improvement:

// BEFORE: Would scan ALL iterations
let loop_events_subspace = self.subspace.subspace(&EventHistorySubspaceKey::entire(...));

// AFTER: Only scans iterations 0 to iteration-1  
let loop_events_subspace_start = /* iteration 0 */
let loop_events_subspace_end = /* iteration - 1 */

This prevents the bug AND improves performance by limiting the scan range.

⚠️ Potential Concern: Large Iteration Counts

The code keeps the last 100 iterations in forgotten history:

if iteration > 100 {
    // Clean up old forgotten events
}

Questions:

For long-running loops with many events per iteration, could forgotten history grow unbounded?
Is there a limit on events per iteration that should be enforced?

Recommendation: Consider adding metrics to monitor:

Average events per iteration
Total forgotten history size
Cleanup frequency

Security Concerns

✅ No security issues identified

The changes deal with internal workflow state management and don't expose new attack vectors. The use of bounded ranges actually improves safety by preventing accidental over-deletion.

Test Coverage

❌ Missing: Test Updates

The PR doesn't include any test changes or additions. Given the critical nature of this bug (data being incorrectly archived), this needs tests:

Required Tests:

Concurrent Loop Iterations: Verify that iteration N doesn't archive its own events
Boundary Conditions:
- iteration = 0 (first iteration)
- iteration = 1 (should only archive iteration 0)
- iteration > 100 (forgotten history cleanup)
Range Correctness: Verify events from iterations [0, iteration-1] are archived, not [0, iteration]

Recommendation:

#[test]
async fn test_loop_event_concurrent_iteration_archival() {
    // Setup: Create loop with iteration 5
    // Insert events for iterations 0-5
    // Call upsert_workflow_loop_event with iteration=5
    // Assert: Events for 0-4 are in forgotten, events for 5 are in active
}

Final Recommendations

🔴 Must Address Before Merge:

Clarify the runner → runner2 change - Is this intentional? Should it be separate?
Add tests for the loop event archival fix
Document or remove the commented-out build configurations
Review the init signal handling change - Is duplicate init allowed?

⚠️ Should Address:

Add inline comments explaining the archival range logic
Add logging for key operations (especially init overwrite)
Consider adding metrics for forgotten history size

✅ Good to Merge After Addressing Above:

The core bug fix (range-based event archival) is solid and addresses a real concurrency issue. Once the peripheral changes are clarified and tests are added, this should be safe to merge.

Overall Assessment: The main bug fix is well-reasoned and correct, but the PR includes several unrelated changes that need clarification. The lack of tests for a critical bug fix is concerning.

claude · 2025-11-26T02:22:33Z

Code Review

I've reviewed this PR which fixes a loop forgotten bug due to concurrency in the gasoline workflow engine. Here's my analysis:

Core Fix: Loop Event Range Selection (engine/packages/gasoline/src/db/kv/mod.rs)

Positive:

Critical bug fix: The change correctly addresses a concurrency issue where all loop events were being moved to forgotten history, not just events from previous iterations
Precise range selection: The new implementation uses specific start/end ranges:
- Start: {loop location, 0, ...}
- End: {loop location, iteration - 1, ...}
Overflow safety: Uses saturating_sub(1) to handle the edge case when iteration is 0
Improved comments: Better documentation explaining the range bounds

Issues & Suggestions:

Potential edge case: When iteration == 0, the range end becomes iteration.saturating_sub(1) which is 0. This means the range is [0, 0) - an empty range. This is correct behavior (no events to forget on first iteration), but worth verifying this is intentional.
Missing transaction error handling: Line 2628 sets a value in the transaction (tx.set(&forgotten_key, entry.value())), but there's no explicit error handling if this fails. Consider whether this could leave the DB in an inconsistent state if it fails after some entries are moved.
Performance consideration: The code does a stream read + individual tx.set() calls + a final clear_range(). For large iteration counts with many events, this could be slow. Consider if batch operations would be more efficient.

Pegboard Runner Changes (engine/packages/pegboard/src/workflows/runner2.rs)

Line 105-108:

init = Some(init_sig);
check_queue = true;

Issue: The removal of the if init.is_none() check means that if multiple ToServerInit signals arrive, the init will be overwritten.

Question: Is this intentional? Can multiple init signals arrive due to concurrency/retries?
If multiple inits can arrive, should we log a warning when overwriting?
If only one init should arrive, should we error/warn on duplicate inits?

Recommendation: Add a check or comment explaining why it's safe to overwrite init, or add error handling if this shouldn't happen.

Commented Code Issues

Cargo.toml (lines 436-439):

# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false

Critical: These commented-out profile settings should not be committed to the codebase:

If they're temporary debugging settings, remove them
If they're intended for future use, document them with a TODO and context
Commented code in version control is an anti-pattern

scripts/docker/build-push.sh (line 29):

# --build-arg CARGO_BUILD_MODE=release \

Issue: Commented build arg in the middle of a command. This suggests:

Either this was used for debugging and should be removed
Or there's an incomplete feature that should be finished or documented
The script has a syntax error risk if uncommented without proper formatting

scripts/run/docker/engine-rocksdb.sh (line 18):

Removed trailing newline - this is fine but seems unrelated to the PR

Security & Best Practices

No security concerns identified - The changes are internal database operations and workflow state management.

Best practices observations:

✅ Good use of structured logging with tracing
✅ Proper error propagation with ? operator
✅ Range safety with saturating_sub
⚠️ Commented code should be cleaned up

Test Coverage

Recommendation: This PR fixes a concurrency bug. Consider adding:

A test that simulates concurrent loop iterations to verify events are properly scoped
Edge case test for iteration == 0
Test verifying the 100-event retention limit still works correctly

Summary

Approval recommendation: ⚠️ Approve with changes

The core bug fix is sound and addresses a real concurrency issue. However:

Must fix: Remove commented code from Cargo.toml and scripts/docker/build-push.sh
Should clarify: The init overwrite behavior in runner2.rs - is this intentional?
Consider: Add tests for the concurrency scenario this fixes

Once the commented code is cleaned up and the init overwrite is clarified, this should be good to merge.

claude · 2025-11-27T00:55:46Z

Code Review - PR #3482: fix(gas): fix loop forgotten bug due to concurrency

Summary

This PR fixes a critical concurrency bug in the loop event history management system where events from all loop iterations were being moved to forgotten history, rather than just events up to the current iteration.

Code Quality & Best Practices

✅ Strengths

Clear bug fix: The change correctly addresses the issue by using a bounded range instead of clearing all events under a loop
Good comments: Updated comments accurately reflect the new behavior ("up to the current iteration")
Appropriate method usage: Changed from clear_subspace_range to clear_range with explicit bounds
Safe arithmetic: Uses saturating_sub(1) to prevent underflow on iteration 0

⚠️ Areas for Improvement

Commented code in Cargo.toml (lines 436-439):
```
# strip = true
# panic = "abort"
# overflow-checks = false
# debug-assertions = false
```
These commented-out release profile options should either be:
- Removed if not needed
- Uncommented if they're part of the fix
- Moved to a separate PR if they're exploratory changes
Recommendation: Remove these lines to keep the PR focused on the bug fix.
Commented code in scripts/docker/build-push.sh (line 29):
```
# --build-arg CARGO_BUILD_MODE=release \
```
Similar issue - this commented line should be removed or moved to a separate PR.
Missing newline at EOF: scripts/run/docker/engine-rocksdb.sh now ends without a trailing newline (line deleted). This is a minor style issue but worth noting.

Correctness & Logic

✅ Core Fix is Sound

Before: The code was clearing ALL events under a loop location:

let loop_events_subspace = self.subspace.subspace(&keys::history::EventHistorySubspaceKey::entire(...));
tx.clear_subspace_range(&loop_events_subspace);

After: The code now clears only events from iteration 0 to (current iteration - 1):

let loop_events_subspace_start = /* iteration 0 */
let loop_events_subspace_end = /* iteration - 1 */
tx.clear_range(&loop_events_subspace_start, &loop_events_subspace_end);

This ensures that events from the current iteration and beyond are preserved.

⚠️ Potential Edge Cases

Iteration 0 behavior: When iteration = 0, saturating_sub(1) returns 0, so both start and end will point to iteration 0. This means:
- The range will be [0, 0) (exclusive end)
- No events will be cleared
- This appears correct, but should be verified in testing
Range semantics: Verify that clear_range uses exclusive end semantics. The code assumes that range().1 gives an exclusive upper bound, which should be validated.

Performance Considerations

✅ Good Performance Characteristics

Bounded range queries: The change from clearing entire subspaces to bounded ranges is more efficient and precise
No additional database roundtrips: The logic still uses a single transaction

💡 Potential Optimization

The code still iterates through all matching events to move them to forgotten history, then clears the range. Consider whether this could be optimized with a single range operation if the database supports atomic move/rename operations.

Security Concerns

✅ No Security Issues Identified

The change doesn't introduce any new attack vectors
Proper key construction using the existing tuple packing system
No user input directly influences the range bounds

Test Coverage

⚠️ Testing Concerns

No test changes in this PR: This is a critical bug fix but doesn't include new tests or updates to existing tests
Recommended test scenarios:
- Loop event cleanup at iteration 0
- Loop event cleanup at iteration 1
- Loop event cleanup after 100+ iterations (to verify the cleanup logic)
- Concurrent loop iterations to verify the race condition is fixed
- Verify events from current and future iterations are NOT moved to forgotten history

Recommendation: Add integration tests that verify:

// Test case: Events from current iteration should remain in active history
// Test case: Events from previous iterations should move to forgotten history  
// Test case: Cleanup should work correctly at iteration boundaries (0, 1, 100, 101)

Additional Changes Review

pegboard/src/workflows/runner2.rs

Change: Removed conditional check for init.is_none()

// Before:
if init.is_none() {
    init = Some(init_sig);
    check_queue = true;
}

// After:
init = Some(init_sig);
check_queue = true;

Analysis: This allows re-initialization, which may be intentional behavior to handle reconnection scenarios. However, this seems like a significant behavior change that should be:

Mentioned in the PR description
Verified as safe (won't cause issues with existing initialized states)
Potentially separated into its own PR

pegboard-runner/src/conn.rs

Change: Updated workflow reference from runner to runner2

This appears to be switching to a new workflow implementation. Should verify:

Is runner2 fully tested and ready for production?
Is there a migration plan for existing runner workflows?
Should this be in a separate PR?

Recommendations

High Priority: Remove commented code from Cargo.toml and build-push.sh
High Priority: Add test coverage for the loop forgotten event fix
Medium Priority: Clarify/separate the runner2 migration and init behavior changes
Low Priority: Add trailing newline to engine-rocksdb.sh

Verdict

Approve with suggested changes ✅

The core bug fix is sound and addresses a real concurrency issue. The main concerns are:

Lack of test coverage for the fix
Unrelated changes (commented code, runner2 migration) that should be cleaned up or separated

The logic change correctly bounds the range of events to move to forgotten history, preventing future iterations' events from being prematurely archived.

claude · 2025-11-27T00:58:14Z

Code Review - PR #3482: fix(gas): fix loop forgotten bug due to concurrency

Summary

This PR fixes a critical bug in the workflow engine's loop event management where concurrent loop iterations could cause events from all iterations to be moved to forgotten history, rather than just events from previous iterations. The fix properly bounds the range query to only affect events up to iteration - 1.

Main Changes Analysis

1. Core Bug Fix: Loop Event Range Limiting (gasoline/src/db/kv/mod.rs)

Issue Fixed:
The original code used subspace() which creates a range that matches all events under a loop location regardless of iteration number. When moving events to forgotten history on iteration N, this incorrectly included events from the current iteration N.

Solution:

Changed from using a single subspace range to explicitly constructed start/end bounds
Start: {loop location, 0, ...} (iteration 0)
End: {loop location, iteration - 1, ...} (one before current)
This ensures only events from completed iterations (0 to N-1) are moved to forgotten history

Code Quality: Excellent

The fix correctly uses .range().0 and .range().1 to get precise byte boundaries
Uses saturating_sub(1) to safely handle iteration 0 edge case
Clear comments explain the range boundaries
Correctly updates both the query range (get_ranges_keyvalues) and clear range (clear_range)

Correctness: Verified
The logic is sound:

When iteration == 0: No previous iterations exist, so the range query returns empty (start = end after saturating_sub)
When iteration >= 1: Range includes only iterations 0 to (iteration - 1)

2. Runner Init Signal Handling (pegboard/src/workflows/runner2.rs)

Change: Removed the if init.is_none() guard when processing ToServerInit signals.

Impact: Potentially Problematic

Concerns:

Duplicate Init Signals: If a runner sends multiple init signals (reconnection, retries, bugs), the workflow will now process all of them rather than ignoring duplicates
State Reset Risk: Re-initializing could potentially reset important state that should be preserved
check_queue Trigger: Setting check_queue = true on every init could cause unnecessary queue processing

Questions:

Is this intentional to handle runner reconnections?
What happens if a runner sends init twice (network retry, bug, etc.)?
Should there be validation that subsequent inits match the first one?

Recommendation: Add a comment explaining why duplicate inits should be allowed, or add validation.

3. Workflow Reference Update (pegboard-runner/src/conn.rs)

Change: pegboard::workflows::runner to pegboard::workflows::runner2

Impact: Looks correct but raises questions:

Is this part of a migration from old runner to new runner2?
Are both implementations running in production?
Should old runner be deprecated/removed?

4. Commented-Out Cargo.toml Changes

Impact: Leftover debug code

These commented lines should either be removed entirely if not needed for this PR, uncommented if they're intended changes, or moved to a separate PR if they're unrelated optimizations.

Concerns:

overflow-checks = false and debug-assertions = false can hide bugs in production
panic = abort changes error handling behavior significantly

Recommendation: Remove these commented lines from this PR unless they're directly related to the bug fix.

5. Docker Build Script Changes (scripts/docker/build-push.sh)

Change: Commented-out CARGO_BUILD_MODE=release argument

Impact: Leftover debug code

Similar to Cargo.toml - should be removed unless intentional for this PR.

Security Considerations

No security concerns identified

The fix properly bounds database queries, preventing data leakage
No new external inputs or untrusted data handling

Performance Considerations

Performance Improved

The range query is now more precise, scanning fewer keys
Moving to clear_range with explicit bounds is more efficient than clearing an entire subspace

Testing Recommendations

Critical: Test loop iterations to verify events are correctly partitioned:
- Run a loop workflow for 5+ iterations
- Verify forgotten history only contains events from previous iterations
- Verify active history contains current iteration events
Edge Cases:
- Test iteration 0 (no previous iterations)
- Test iteration 1 (first move to forgotten)
- Test the 100-iteration cleanup threshold
Runner2 Init:
- Test duplicate init signals behavior
- Test runner reconnection scenarios
- Verify queue processing isn't triggered unnecessarily

Code Style and Conventions

Follows repository conventions:

Uses structured logging (tracing)
Proper error handling with ? operator
Clear comments explaining complex logic
Hard tabs for indentation (per rustfmt.toml)

Summary and Recommendations

Approval Status: Approve with reservations

Must Address:

Remove commented-out Cargo.toml and build script changes (or explain why they're needed)
Clarify the runner2 init signal handling change - is duplicate processing intentional?

Should Address:
3. Add validation or comments for the init signal handling change
4. Document the runner to runner2 migration if that's ongoing

Core Bug Fix: Excellent - The main fix for the loop forgotten bug is correct and well-implemented.

Great work on identifying and fixing this concurrency bug! The core fix is solid. The main concerns are the ancillary changes that seem like debug/development artifacts rather than intentional changes for this PR.

graphite-app · 2025-12-03T00:33:06Z

Merge activity

Dec 3, 12:33 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
Dec 3, 12:34 AM UTC: CI is running for this pull request on a draft pull request (#3577) due to your merge queue CI optimization settings.
Dec 3, 12:34 AM UTC: Merged by the Graphite merge queue via draft PR: #3577.

This was referenced Nov 18, 2025

fix(gas): optimize loops #3475

Closed

chore: change streamingmode to exact for ranges with limits #3476

Closed

chore(gas): add test for history with holes #3477

Closed

This was referenced Nov 18, 2025

feat(gas): add batch listen #3478

Closed

fix(gas): fix batch listen, fix history for graceful signal send in workflows #3479

Closed

fix(pb): rewrite runner wf to handle batch signals #3480

Closed

MasterPtato mentioned this pull request Nov 18, 2025

fix(pb): simplify runner wf #3483

Closed

vercel bot deployed to Preview – rivetkit-serverless November 18, 2025 23:54 View deployment

MasterPtato force-pushed the 11-17-fix_pb_rewrite_runner_wf_to_handle_batch_signals branch from 4eabd07 to e68ccc5 Compare November 19, 2025 22:06

MasterPtato force-pushed the 11-18-fix_gas_fix_loop_forgotten_bug_due_to_concurrency branch from 1a4f15a to 4294775 Compare November 19, 2025 22:06

This was referenced Nov 19, 2025

chore(engine-runner): remove echo close on ToClientWebSocketClose #3487

Closed

chore(pegboard-gateway): remove TunnelAck #3488

Closed

fix(tunnel): implement ping pong #3491

Closed

NathanFlurry mentioned this pull request Nov 19, 2025

chore(pegboard-gateway): add new message id format & add deprecated tunnel ack #3492

Closed

vercel bot had a problem deploying to Preview – rivetkit-serverless November 19, 2025 22:07 Failure

MasterPtato force-pushed the 11-17-fix_pb_rewrite_runner_wf_to_handle_batch_signals branch from e68ccc5 to 9a87fd1 Compare November 20, 2025 03:07

MasterPtato force-pushed the 11-18-fix_gas_fix_loop_forgotten_bug_due_to_concurrency branch from 4294775 to 5458e50 Compare November 20, 2025 03:07

vercel bot deployed to Preview – rivetkit-serverless November 20, 2025 03:08 View deployment

MasterPtato force-pushed the 11-18-fix_gas_fix_loop_forgotten_bug_due_to_concurrency branch from 5458e50 to 8431b53 Compare November 21, 2025 19:57

MasterPtato force-pushed the 11-17-fix_pb_rewrite_runner_wf_to_handle_batch_signals branch from 9a87fd1 to 0560d9b Compare November 21, 2025 19:57

vercel bot had a problem deploying to Preview – rivetkit-serverless November 21, 2025 19:57 Failure

MasterPtato mentioned this pull request Nov 21, 2025

fix: parallelize publish signal awaits #3523

Closed

MasterPtato force-pushed the 11-17-fix_pb_rewrite_runner_wf_to_handle_batch_signals branch from f0e981f to f7de5c8 Compare November 26, 2025 02:17

vercel bot had a problem deploying to Preview – rivetkit-serverless November 26, 2025 02:17 Failure

MasterPtato marked this pull request as ready for review November 26, 2025 23:36

MasterPtato requested a review from NathanFlurry November 26, 2025 23:36

This was referenced Nov 27, 2025

fix(runtime): evict runners on SIGTERM, add periodic shutdown progress messages #3550

Closed

chore(gh): make claude edit existing discussion comment for subsequent reviews #3551

Closed

MasterPtato force-pushed the 11-17-fix_pb_rewrite_runner_wf_to_handle_batch_signals branch from f7de5c8 to 6eecd1a Compare November 27, 2025 00:52

MasterPtato force-pushed the 11-18-fix_gas_fix_loop_forgotten_bug_due_to_concurrency branch from a5ec483 to 9893845 Compare November 27, 2025 00:52

vercel bot had a problem deploying to Preview – rivetkit-serverless November 27, 2025 00:52 Failure

MasterPtato force-pushed the 11-17-fix_pb_rewrite_runner_wf_to_handle_batch_signals branch from 6eecd1a to 8c26dca Compare November 27, 2025 00:58

MasterPtato force-pushed the 11-18-fix_gas_fix_loop_forgotten_bug_due_to_concurrency branch from 9893845 to 7dad582 Compare November 27, 2025 00:58

vercel bot had a problem deploying to Preview – rivetkit-serverless November 27, 2025 00:58 Failure

fix(gas): fix loop forgotten bug due to concurrency

b422891

MasterPtato force-pushed the 11-18-fix_gas_fix_loop_forgotten_bug_due_to_concurrency branch from 7dad582 to b422891 Compare December 2, 2025 19:10

MasterPtato force-pushed the 11-17-fix_pb_rewrite_runner_wf_to_handle_batch_signals branch from 8c26dca to 9e20345 Compare December 2, 2025 19:10

vercel bot had a problem deploying to Preview – rivetkit-serverless December 2, 2025 19:11 Failure

This was referenced Dec 2, 2025

feat(pegboard): move pegboard serverless to wf #3416

Closed

fix(serverless): general cleanup #3450

Closed

fix(serverless): drain runners after url/headers change #3571

Closed

NathanFlurry approved these changes Dec 2, 2025

View reviewed changes

This was referenced Dec 2, 2025

fix(serverless): fix connection wf lifecycle #3573

Closed

fix(serverless): misc pr fixes #3576

Closed

graphite-app bot pushed a commit that referenced this pull request Dec 3, 2025

fix(gas): fix loop forgotten bug due to concurrency (#3482)

927f939

graphite-app bot closed this Dec 3, 2025

graphite-app bot deleted the 11-18-fix_gas_fix_loop_forgotten_bug_due_to_concurrency branch December 3, 2025 00:34

fix(gas): fix loop forgotten bug due to concurrency #3482

fix(gas): fix loop forgotten bug due to concurrency #3482

Uh oh!

Conversation

MasterPtato commented Nov 18, 2025

Uh oh!

vercel bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MasterPtato commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

pkg-pr-new bot commented Nov 18, 2025

Uh oh!

claude bot commented Nov 18, 2025

PR Review: fix(gas): fix loop forgotten bug due to concurrency

Summary

Core Fix Analysis (engine/packages/gasoline/src/db/kv/mod.rs)

Other Changes Review

1. pegboard-runner/src/conn.rs - Workflow migration

2. pegboard/src/workflows/runner2.rs - Init signal handling

3. Cargo.toml - Commented release profile options

4. scripts/docker/build-push.sh - Commented build arg

5. scripts/run/docker/engine-rocksdb.sh - Trailing newline

Security Considerations

Performance Considerations

Test Coverage

Summary of Recommendations

Overall Assessment

Uh oh!

claude bot commented Nov 19, 2025

Code Review

Summary

✅ Positives

🔍 Issues & Concerns

Critical: Missing Key Deletion (engine/packages/gasoline/src/db/kv/mod.rs:2631)

Medium: Potential Off-by-One Issue (engine/packages/gasoline/src/db/kv/mod.rs:2598)

Low: Commented Out Production Code (Cargo.toml:437-440, scripts/docker/build-push.sh:29)

Low: Unnecessary File Change (scripts/run/docker/engine-rocksdb.sh:16)

🧪 Testing Concerns

🔒 Security Considerations

⚡ Performance Considerations

📝 Style & Conventions

🎯 Recommendations Summary

Overall Assessment

Uh oh!

claude bot commented Nov 19, 2025

PR Review: Fix loop forgotten bug due to concurrency

Summary

Code Quality & Best Practices ✅

Main Fix (gasoline/src/db/kv/mod.rs:2579-2634)

Workflow Fix (pegboard/src/workflows/runner2.rs:103-108)

Potential Bugs & Issues

1. Edge Case: iteration == 0 ⚠️

2. Missing Entry Deletion ❓

Performance Considerations

Range Query Optimization ✅

Stream Processing ✅

Security Concerns

No Major Issues ✅

Test Coverage

Missing Tests ⚠️

Additional Changes Review

Cargo.toml

scripts/docker/build-push.sh

scripts/run/docker/engine-rocksdb.sh

Overall Assessment

Uh oh!

claude bot commented Nov 20, 2025

Code Review

Summary

Critical Bug Fix ✅

Secondary Changes

1. Pegboard Runner Init Logic (runner2.rs:103-108)

2. Workflow Reference Update (conn.rs:128)

3. Build Script Changes (scripts/docker/build-push.sh:26-30)

Performance Considerations

Security Considerations

Test Coverage

vercel bot commented Nov 18, 2025 •

edited

Loading

MasterPtato commented Nov 18, 2025 •

edited

Loading

Critical: Missing Key Deletion (`engine/packages/gasoline/src/db/kv/mod.rs:2631`)

Medium: Potential Off-by-One Issue (`engine/packages/gasoline/src/db/kv/mod.rs:2598`)

Low: Commented Out Production Code (`Cargo.toml:437-440`, `scripts/docker/build-push.sh:29`)

Low: Unnecessary File Change (`scripts/run/docker/engine-rocksdb.sh:16`)

1. Pegboard Runner Init Logic (`runner2.rs:103-108`)

2. Workflow Reference Update (`conn.rs:128`)

3. Build Script Changes (`scripts/docker/build-push.sh:26-30`)

Cargo.toml (`Cargo.toml:436-440`)

Script Newline Issue (`scripts/run/docker/engine-rocksdb.sh:15`)

1. Inconsistent Workflow Reference Path (`engine/packages/pegboard-runner/src/conn.rs:131`)

2. Init Signal Handling Change (`engine/packages/pegboard/src/workflows/runner2.rs:106-108`)

🔴 Critical: Missing Delete Operation (`engine/packages/gasoline/src/db/kv/mod.rs:2628`)

⚠️ Edge Case: What if `iteration = 0`?