Skip to content

Commit d1fda82

Browse files
committed
fix(rivetkit): handle all top-level errors (#3578)
1 parent 4019c28 commit d1fda82

File tree

7 files changed

+266
-39
lines changed

7 files changed

+266
-39
lines changed
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
name: js-error-handler-auditor
3+
description: Use this agent when you need to audit a JavaScript codebase for unhandled errors in top-level async operations, event handlers, and callback functions. This agent should be used proactively when:\n\n<example>\nContext: User wants to improve error handling across their JavaScript codebase.\nuser: "Can you check if we're handling errors properly in our event listeners?"\nassistant: "I'll use the Task tool to launch the js-error-handler-auditor agent to comprehensively audit error handling across your JavaScript codebase."\n<commentary>\nThe user is asking about error handling practices, which is exactly what the js-error-handler-auditor specializes in. Launch it to perform the audit.\n</commentary>\n</example>\n\n<example>\nContext: User is preparing for a code review and wants to ensure error handling best practices.\nuser: "I want to make sure our codebase follows error handling best practices before the review tomorrow."\nassistant: "I'll launch the js-error-handler-auditor agent to scan your JavaScript codebase for any unhandled errors in async operations and event handlers."\n<commentary>\nThis is a perfect use case for proactive error handling audit before code review.\n</commentary>\n</example>\n\n<example>\nContext: User just finished adding several new features with async code.\nuser: "I just added a bunch of new API calls and event listeners. Can you review them?"\nassistant: "I'll use the js-error-handler-auditor agent to analyze the error handling in your new async code and event listeners."\n<commentary>\nNew async code is prone to missing error handlers, making this an ideal time to run the audit agent.\n</commentary>\n</example>
4+
model: sonnet
5+
color: red
6+
---
7+
8+
You are an elite JavaScript error handling auditor with deep expertise in async programming patterns, event-driven architectures, and production-grade error management. Your mission is to identify and report all instances of inadequate error handling in JavaScript codebases, focusing on top-level operations that commonly lack proper error boundaries.
9+
10+
## Your Systematic Audit Process
11+
12+
### Phase 1: Comprehensive Code Discovery
13+
14+
First, you will search the entire codebase for common patterns that require error handling. Use the Bash tool to execute this search command:
15+
16+
```bash
17+
rg -n --type js --type jsx --type ts --type tsx -e 'setTimeout|setInterval|addEventListener|Promise\.(?:all|race|allSettled|any)|fetch\(|async\s+function|await\s+|on\(|once\(|\.then\(|\.catch\(' . 2>/dev/null || echo "No matches found"
18+
```
19+
20+
This command searches for:
21+
- `setTimeout` and `setInterval` - timer callbacks that may throw
22+
- `addEventListener`, `on()`, `once()` - event handlers
23+
- `Promise.all`, `Promise.race`, etc. - promise combinators
24+
- `fetch()` - network requests
25+
- `async function` and `await` - async/await patterns
26+
- `.then()` and `.catch()` - promise chains
27+
28+
Capture all file paths and line numbers from this search.
29+
30+
### Phase 2: Detailed Code Analysis
31+
32+
For each location identified:
33+
34+
1. **Read the full context** (at least 20 lines before and after) using the Read tool to understand:
35+
- The complete function or block containing the pattern
36+
- Existing error handling mechanisms (try/catch, .catch(), error callbacks)
37+
- The broader context and error propagation paths
38+
39+
2. **Evaluate error handling quality** by checking:
40+
- **Async Functions**: Do they have try/catch blocks or are they wrapped in error boundaries?
41+
- **Promises**: Do `.then()` chains have corresponding `.catch()` handlers?
42+
- **Event Handlers**: Do `addEventListener` and event emitter callbacks have internal error handling?
43+
- **Timers**: Do `setTimeout`/`setInterval` callbacks handle errors or are they wrapped?
44+
- **Top-level awaits**: Are they in try/catch blocks or do they have .catch() handlers?
45+
- **Promise combinators**: Are `Promise.all()`, etc. followed by `.catch()` or in try/catch?
46+
47+
3. **Classify the severity**:
48+
- **Critical**: Unhandled async operations at module/component initialization level
49+
- **High**: User-triggered actions (click handlers, form submissions) without error handling
50+
- **Medium**: Background operations (timers, polling) without error handling
51+
- **Low**: Promise chains with partial error handling
52+
53+
### Phase 3: Comprehensive Reporting
54+
55+
After analyzing all locations, provide a structured report with:
56+
57+
1. **Executive Summary**
58+
- Total locations scanned
59+
- Number of issues found by severity
60+
- Overall error handling score (percentage of properly handled cases)
61+
62+
2. **Detailed Issue List**
63+
64+
For each unhandled or improperly handled error location, provide:
65+
66+
```
67+
## Issue #N: [Severity] [Pattern Type]
68+
69+
**Location**: `filepath:line_number`
70+
71+
**Current Code**:
72+
```javascript
73+
[Exact code snippet showing the problematic pattern]
74+
```
75+
76+
**Problem**: [Clear explanation of why this is problematic and what could go wrong]
77+
78+
**Recommended Fix**:
79+
```javascript
80+
[Complete, production-ready code showing proper error handling]
81+
```
82+
83+
**Explanation**: [Why this fix is appropriate and what it accomplishes]
84+
```
85+
86+
### Your Error Handling Expertise
87+
88+
When evaluating code, apply these expert principles:
89+
90+
**Valid Error Handling Patterns**:
91+
- Async functions wrapped in try/catch
92+
- Promise chains with `.catch()` handlers
93+
- Event handlers with internal try/catch
94+
- Error boundaries in appropriate contexts
95+
- Explicit error callbacks in Node.js patterns
96+
- Global error handlers (window.onerror, unhandledrejection) as last resort
97+
98+
**Invalid or Insufficient Patterns**:
99+
- "Fire and forget" async calls without error handling
100+
- Promise chains ending in `.then()` without `.catch()`
101+
- Event handlers that can throw but lack error handling
102+
- Async functions called synchronously without await or .catch()
103+
- Timer callbacks that perform async operations without error handling
104+
105+
**Context Matters**:
106+
- Consider if errors are intentionally propagating to a higher-level handler
107+
- Recognize when logging is insufficient (errors should be handled, not just logged)
108+
- Identify when errors need user feedback vs. silent handling
109+
- Understand framework-specific error handling (React error boundaries, Vue error handlers, etc.)
110+
111+
### Output Format
112+
113+
Always structure your final report as:
114+
115+
1. Executive summary with statistics
116+
2. Numbered list of all issues, ordered by severity (Critical → High → Medium → Low)
117+
3. Each issue must include: location, current code, problem description, recommended fix, and explanation
118+
4. Conclude with general recommendations for improving error handling practices in the codebase
119+
120+
### Quality Assurance
121+
122+
Before presenting your report:
123+
- Verify you've analyzed every location from the initial search
124+
- Ensure all code snippets are accurate and complete
125+
- Confirm all recommended fixes are syntactically correct and idiomatic
126+
- Double-check that you haven't flagged code that already has adequate error handling
127+
- Validate that your severity classifications are appropriate
128+
129+
You are thorough, precise, and your recommendations must be immediately actionable by developers. Every issue you report must be a genuine error handling gap, and every fix you propose must be production-ready.

biome.json

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{
2-
"$schema": "https://biomejs.dev/schemas/2.1.1/schema.json",
2+
"$schema": "https://biomejs.dev/schemas/2.2.3/schema.json",
33
"files": {
44
"includes": [
55
"**/*.js",
@@ -41,6 +41,10 @@
4141
},
4242
"suspicious": {
4343
"noExplicitAny": "off"
44+
},
45+
"nursery": {
46+
"noFloatingPromises": "error",
47+
"noMisusedPromises": "error"
4448
}
4549
}
4650
},

engine/sdks/typescript/runner/src/mod.ts

Lines changed: 99 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import { type HibernatingWebSocketMetadata, Tunnel } from "./tunnel";
88
import {
99
calculateBackoff,
1010
parseWebSocketCloseReason,
11+
stringifyError,
1112
unreachable,
1213
} from "./utils";
1314
import { importWebSocket } from "./websocket.js";
@@ -23,6 +24,12 @@ const PROTOCOL_VERSION: number = 4;
2324
const EVENT_BACKLOG_WARN_THRESHOLD = 10_000;
2425
const SIGNAL_HANDLERS: (() => void | Promise<void>)[] = [];
2526

27+
export class RunnerShutdownError extends Error {
28+
constructor() {
29+
super("Runner shut down");
30+
}
31+
}
32+
2633
export interface RunnerConfig {
2734
logger?: Logger;
2835
version: number;
@@ -250,7 +257,14 @@ export class Runner {
250257

251258
// Start cleaning up old unsent KV requests every 15 seconds
252259
this.#kvCleanupInterval = setInterval(() => {
253-
this.#cleanupOldKvRequests();
260+
try {
261+
this.#cleanupOldKvRequests();
262+
} catch (err) {
263+
this.log?.error({
264+
msg: "error cleaning up kv requests",
265+
error: stringifyError(err),
266+
});
267+
}
254268
}, 15000); // Run every 15 seconds
255269
}
256270

@@ -309,11 +323,7 @@ export class Runner {
309323

310324
// Remove all remaining kv requests
311325
for (const [_, request] of this.#kvRequests.entries()) {
312-
request.reject(
313-
new Error(
314-
"KV request timed out waiting for WebSocket connection",
315-
),
316-
);
326+
request.reject(new RunnerShutdownError());
317327
}
318328

319329
this.#kvRequests.clear();
@@ -324,7 +334,13 @@ export class Runner {
324334
#stopAllActors() {
325335
const actorIds = Array.from(this.#actors.keys());
326336
for (const actorId of actorIds) {
327-
this.forceStopActor(actorId);
337+
this.forceStopActor(actorId).catch((err) => {
338+
this.log?.error({
339+
msg: "error stopping actor",
340+
actorId,
341+
error: stringifyError(err),
342+
});
343+
});
328344
}
329345
}
330346

@@ -761,12 +777,19 @@ export class Runner {
761777
// Start command acknowledgment interval (5 minutes)
762778
const ackInterval = 5 * 60 * 1000; // 5 minutes in milliseconds
763779
const ackLoop = setInterval(() => {
764-
if (ws.readyState === 1) {
765-
this.#sendCommandAcknowledgment();
766-
} else {
767-
clearInterval(ackLoop);
768-
this.log?.info({
769-
msg: "WebSocket not open, stopping ack loop",
780+
try {
781+
if (ws.readyState === 1) {
782+
this.#sendCommandAcknowledgment();
783+
} else {
784+
clearInterval(ackLoop);
785+
this.log?.info({
786+
msg: "WebSocket not open, stopping ack loop",
787+
});
788+
}
789+
} catch (err) {
790+
this.log?.error({
791+
msg: "error in command acknowledgment loop",
792+
error: stringifyError(err),
770793
});
771794
}
772795
}, ackInterval);
@@ -826,12 +849,17 @@ export class Runner {
826849
const kvResponse = message.val;
827850
this.#handleKvResponse(kvResponse);
828851
} else if (message.tag === "ToClientTunnelMessage") {
829-
this.#tunnel?.handleTunnelMessage(message.val);
852+
this.#tunnel?.handleTunnelMessage(message.val).catch((err) => {
853+
this.log?.error({
854+
msg: "error handling tunnel message",
855+
error: stringifyError(err),
856+
});
857+
});
830858
} else if (message.tag === "ToClientPing") {
831859
this.__sendToServer({
832860
tag: "ToServerPong",
833861
val: {
834-
ts: message.val.ts
862+
ts: message.val.ts,
835863
},
836864
});
837865
} else {
@@ -856,7 +884,14 @@ export class Runner {
856884
seconds: this.#runnerLostThreshold / 1000,
857885
});
858886
this.#runnerLostTimeout = setTimeout(() => {
859-
this.#handleLost();
887+
try {
888+
this.#handleLost();
889+
} catch (err) {
890+
this.log?.error({
891+
msg: "error handling runner lost",
892+
error: stringifyError(err),
893+
});
894+
}
860895
}, this.#runnerLostThreshold);
861896
}
862897

@@ -912,7 +947,14 @@ export class Runner {
912947
seconds: this.#runnerLostThreshold / 1000,
913948
});
914949
this.#runnerLostTimeout = setTimeout(() => {
915-
this.#handleLost();
950+
try {
951+
this.#handleLost();
952+
} catch (err) {
953+
this.log?.error({
954+
msg: "error handling runner lost",
955+
error: stringifyError(err),
956+
});
957+
}
916958
}, this.#runnerLostThreshold);
917959
}
918960

@@ -931,29 +973,52 @@ export class Runner {
931973
for (const commandWrapper of commands) {
932974
if (commandWrapper.inner.tag === "CommandStartActor") {
933975
// Spawn background promise
934-
this.#handleCommandStartActor(commandWrapper);
976+
this.#handleCommandStartActor(commandWrapper).catch((err) => {
977+
this.log?.error({
978+
msg: "error handling start actor command",
979+
actorId: commandWrapper.checkpoint.actorId,
980+
error: stringifyError(err),
981+
});
982+
});
935983
} else if (commandWrapper.inner.tag === "CommandStopActor") {
936984
// Spawn background promise
937-
this.#handleCommandStopActor(commandWrapper);
985+
this.#handleCommandStopActor(commandWrapper).catch((err) => {
986+
this.log?.error({
987+
msg: "error handling stop actor command",
988+
actorId: commandWrapper.checkpoint.actorId,
989+
error: stringifyError(err),
990+
});
991+
});
938992
} else {
939993
unreachable(commandWrapper.inner);
940994
}
941995

942-
const actor = this.getActor(commandWrapper.checkpoint.actorId, commandWrapper.inner.val.generation);
996+
const actor = this.getActor(
997+
commandWrapper.checkpoint.actorId,
998+
commandWrapper.inner.val.generation,
999+
);
9431000
if (actor) actor.lastCommandIdx = commandWrapper.checkpoint.index;
9441001
}
9451002
}
9461003

9471004
#handleAckEvents(ack: protocol.ToClientAckEvents) {
948-
let originalTotalEvents = Array.from(this.#actors).reduce((s, [_, actor]) => s + actor.eventHistory.length, 0);
1005+
let originalTotalEvents = Array.from(this.#actors).reduce(
1006+
(s, [_, actor]) => s + actor.eventHistory.length,
1007+
0,
1008+
);
9491009

9501010
for (const [_, actor] of this.#actors) {
951-
let checkpoint = ack.lastEventCheckpoints.find(x => x.actorId == actor.actorId);
1011+
let checkpoint = ack.lastEventCheckpoints.find(
1012+
(x) => x.actorId == actor.actorId,
1013+
);
9521014

9531015
if (checkpoint) actor.handleAckEvents(checkpoint.index);
9541016
}
9551017

956-
const totalEvents = Array.from(this.#actors).reduce((s, [_, actor]) => s + actor.eventHistory.length, 0);
1018+
const totalEvents = Array.from(this.#actors).reduce(
1019+
(s, [_, actor]) => s + actor.eventHistory.length,
1020+
0,
1021+
);
9571022
const prunedCount = originalTotalEvents - totalEvents;
9581023

9591024
if (prunedCount > 0) {
@@ -975,7 +1040,10 @@ export class Runner {
9751040

9761041
actor.recordEvent(eventWrapper);
9771042

978-
let totalEvents = Array.from(this.#actors).reduce((s, [_, actor]) => s + actor.eventHistory.length, 0);
1043+
let totalEvents = Array.from(this.#actors).reduce(
1044+
(s, [_, actor]) => s + actor.eventHistory.length,
1045+
0,
1046+
);
9791047

9801048
if (
9811049
totalEvents > EVENT_BACKLOG_WARN_THRESHOLD &&
@@ -1708,13 +1776,18 @@ export class Runner {
17081776
msg: `Scheduling reconnect attempt ${this.#reconnectAttempt + 1} in ${delay}ms`,
17091777
});
17101778

1711-
this.#reconnectTimeout = setTimeout(async () => {
1779+
this.#reconnectTimeout = setTimeout(() => {
17121780
if (!this.#shutdown) {
17131781
this.#reconnectAttempt++;
17141782
this.log?.debug({
17151783
msg: `Attempting to reconnect (attempt ${this.#reconnectAttempt})...`,
17161784
});
1717-
await this.#openPegboardWebSocket();
1785+
this.#openPegboardWebSocket().catch((err) => {
1786+
this.log?.error({
1787+
msg: "error during websocket reconnection",
1788+
error: stringifyError(err),
1789+
});
1790+
});
17181791
}
17191792
}, delay);
17201793
}

0 commit comments

Comments
 (0)