Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Dec 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivet-site Ready Ready Preview Comment Dec 3, 2025 0:21am
rivetkit-serverless Ready Ready Preview Comment Dec 3, 2025 0:21am
2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Dec 3, 2025 0:21am
rivet-inspector Ignored Ignored Preview Dec 3, 2025 0:21am

Copy link
Contributor Author

MasterPtato commented Dec 2, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Dec 2, 2025

PR Review: fix(serverless): fix connection wf lifecycle

Summary

This PR refactors the serverless connection workflow lifecycle to improve state management and error handling. The main changes involve restructuring how the outbound request activity handles connection states and drain signals.

Code Quality & Best Practices ✅

Positive observations:

  • Good use of structured logging with tracing macros (e.g., tracing::debug!(?err, "failed to send drain signal"))
  • Proper error handling with context added using .context()
  • Clear separation of concerns between workflow and activity logic
  • Follows the repository's error handling patterns using anyhow

Suggestions:

  1. engine/packages/pegboard/src/workflows/serverless/connection.rs:162-175 - The infinite loop in outbound_req could potentially run forever if OutboundReqOutput::Continue is always returned. Consider adding safeguards:

    loop {
        match outbound_req_inner(ctx, input, &mut term_signal, &mut drain_sub).await {
            Ok(OutboundReqOutput::Continue) => {} // Could loop forever
            // ...
        }
    }

    The logic appears correct based on the implementation (Continue is only returned when the stream ends client-side), but this warrants careful testing.

  2. engine/packages/api-public/src/runner_configs/upsert.rs:96-99 - Good addition of datacenter validation! However, consider providing more context in the error:

    if !body.datacenters.is_empty() {
        let invalid_dcs: Vec<_> = body.datacenters.keys().collect();
        tracing::warn!(?invalid_dcs, "attempted to upsert with non-existent datacenters");
        return Err(crate::errors::Datacenter::NotFound.build());
    }

Potential Bugs & Issues ⚠️

  1. Critical: State machine logic change (engine/packages/pegboard/src/workflows/serverless/connection.rs:58-60, 101-108)

    The semantics of drain_sent have been inverted. Previously:

    • send_drain_started: true meant "we need to send the drain signal"
    • Now: drain_sent: false means "we need to send the drain signal"

    While the implementation appears correct, this is error-prone. The double negative in line 101 if !drain_sent makes reasoning about the logic harder. Consider renaming to needs_retry or drain_signal_failed for clarity.

  2. Resource management (engine/packages/pegboard/src/workflows/serverless/connection.rs:156-160)

    The term_signal and drain_sub are created once before the loop. If the loop iterates multiple times (on Continue), these subscriptions persist. Verify this is the intended behavior and that subscriptions don't accumulate stale state between iterations.

  3. TypeScript SDK cleanup (engine/sdks/typescript/runner/src/mod.ts:311-319)

    Good addition of KV request cleanup in #handleLost()! However, verify that all other connection-related resources are also cleaned up when this threshold is reached (websockets, timers, etc.).

Performance Considerations 🚀

  1. Reduced error logging verbosity (line 171): Changing from tracing::error! to tracing::warn! is appropriate since retries handle this. Good change.

  2. Unnecessary struct removal: Removing the intermediate OutboundReqInnerOutput struct and using the enum directly is cleaner and reduces allocations.

  3. Datacenter validation (engine/packages/api-public/src/runner_configs/upsert.rs): The new validation happens before processing, which is good for fail-fast behavior.

Security Concerns 🔒

No major security issues identified. The changes maintain existing security patterns:

  • Auth tokens handled correctly
  • No new user input validation issues introduced
  • Error messages don't leak sensitive information

Test Coverage 🧪

Concerns:

  1. The example file changes (examples/counter/scripts/connect.ts) suggest this might be used for manual testing, but there's no indication of automated test coverage for the state machine changes.

  2. Critical state transitions that should have test coverage:

    • Connection lifecycle: ContinueDraining
    • Drain signal send failure and retry logic
    • Proper cleanup when runner becomes lost (TypeScript SDK)
    • Datacenter validation in upsert endpoint

Recommendations:

  • Add integration tests for the connection workflow lifecycle
  • Add unit tests for the new datacenter validation
  • Test the KV request cleanup in #handleLost()

Documentation & Naming 📝

  1. Good additions:

    • Comment at line 53-54 explaining the queue scenario
    • Clear documentation of drain_sent in the enum (line 148-149)
  2. Suggestions:

    • The comment at line 148 says "Whether or not to retry sending the drain signal because it failed or was never sent" - consider simplifying to "true if drain signal was successfully sent"
    • Add a module-level comment explaining the overall connection lifecycle flow

Additional Notes

  • The llms-full.txt and llms.txt changes appear to be documentation updates (likely auto-generated)
  • The pnpm-lock.yaml changes suggest removal of the examples/hono-serverless package
  • Version bumps in AsyncAPI and OpenAPI specs are noted

Verdict

Overall: Approve with minor suggestions

The refactoring improves code clarity and fixes lifecycle issues. The main concerns are:

  1. Test coverage for critical state transitions
  2. The double-negative logic with !drain_sent (naming suggestion)
  3. Validation of resource cleanup in the infinite loop pattern

The changes appear sound but would benefit from comprehensive integration tests given the complexity of the state machine.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 2, 2025

More templates

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3573

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3573

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3573

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3573

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3573

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3573

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3573

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3573

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3573

commit: fd82f21

@graphite-app
Copy link
Contributor

graphite-app bot commented Dec 3, 2025

Merge activity

  • Dec 3, 12:33 AM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Dec 3, 12:34 AM UTC: CI is running for this pull request on a draft pull request (#3577) due to your merge queue CI optimization settings.
  • Dec 3, 12:34 AM UTC: Merged by the Graphite merge queue via draft PR: #3577.

@graphite-app graphite-app bot closed this Dec 3, 2025
@graphite-app graphite-app bot deleted the 12-02-fix_serverless_fix_connection_wf_lifecycle branch December 3, 2025 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants