Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .cmux/scripts/demo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we symlink scripts -> .mux/scripts in our repo? I assume that should be standard practice for any mux heavy repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that makes sense.

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env bash
# Description: Demo script to showcase the script execution feature. Accepts no arguments.
set -euo pipefail

# Progress messages to stderr (shown to user, not sent to agent)
echo "Running demo script..." >&2
echo "Current workspace: $(pwd)" >&2
echo "Timestamp: $(date)" >&2

# Structured output to stdout (sent to agent)
cat <<'EOF'
## 🎉 Script Execution Demo

✅ Script executed successfully!

**Output Semantics:**
- `stdout`: Sent to the agent as tool result
- `stderr`: Shown to user only (progress/debug info)

The demo script completed. You can create workspace-specific scripts to automate tasks.
EOF
35 changes: 35 additions & 0 deletions .cmux/scripts/echo
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/usr/bin/env bash
# Description: Echo arguments demo. Accepts any number of arguments (strings) which will be echoed back.
set -euo pipefail

# Check if arguments were provided
if [ $# -eq 0 ]; then
cat <<'EOF'
## ⚠️ No Arguments Provided

Usage: `/s echo <message...>`

Example: `/s echo hello world`
EOF
exit 0
fi

# Structured output to stdout (sent to agent)
cat <<EOF
## 🔊 Echo Script

**You said:** $@

**Arguments received:**
- Count: $# arguments
- First arg: ${1:-none}
- Second arg: ${2:-none}
- All args: $@

**Individual arguments:**
EOF

# Loop through each argument
for i in $(seq 1 $#); do
echo "- Arg $i: ${!i}"
done
7 changes: 7 additions & 0 deletions .cmux/scripts/wait_pr_checks
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Description: Wait for PR checks to pass on GitHub. Use this after pushing changes to origin, to catch CI failures. Accepts no arguments.
set -euo pipefail

BRANCH=$(git branch --show-current)
NUMBER=$(gh pr list --head "$BRANCH" --json number | jq -cr '.[0].number')
./scripts/wait_pr_checks.sh "$NUMBER"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, would be neat if we could seamlessly link scripts and they behaved well under both execution paradigms

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll update the ./scripts/wait_pr_checks.sh script to fetch the branch and number, like in this script, when they're not specified.

47 changes: 47 additions & 0 deletions .github/workflows/codex-comment-watch.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Codex Comment Watch

on:
issue_comment:
types:
- created
pull_request_review_comment:
types:
- created
pull_request_review:
types:
- submitted

permissions:
contents: read
pull-requests: read

concurrency:
group: codex-comment-watch-${{ github.event.issue.number || github.event.pull_request.number || github.run_id }}
cancel-in-progress: true

jobs:
check-codex-comments:
name: Check Codex Comments
runs-on: ${{ github.repository_owner == 'coder' && 'depot-ubuntu-22.04-16' || 'ubuntu-latest' }}
if: >
contains(fromJson('["chatgpt-codex-connector","chatgpt-codex-connector[bot]"]'), github.event.sender.login)
&& (github.event_name != 'issue_comment' || github.event.issue.pull_request != null)
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0 # Required for git describe to find tags

- name: Determine PR number
id: determine-pr
run: |
if [[ "${{ github.event_name }}" == "issue_comment" ]]; then
echo "value=${{ github.event.issue.number }}" >> "$GITHUB_OUTPUT"
else
echo "value=${{ github.event.pull_request.number }}" >> "$GITHUB_OUTPUT"
fi

- name: Check for unresolved Codex comments
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: ./scripts/check_codex_comments.sh ${{ steps.determine-pr.outputs.value }}
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- [SSH](./ssh.md)
- [Forking](./fork.md)
- [Init Hooks](./init-hooks.md)
- [Workspace Scripts](./scripts.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Scripts" is sufficient for name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Since I couldn't think of a better name, I called it 'scripts' for now.
I'm open to suggestions for a new or improved name.

Alternatively, we could rename the page to 'Extensions' and include a section for scripts.

- [VS Code Extension](./vscode-extension.md)
- [Models](./models.md)
- [Keyboard Shortcuts](./keybinds.md)
Expand Down
187 changes: 187 additions & 0 deletions docs/scripts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# Workspace Scripts

Execute custom scripts from your workspace using slash commands or let the AI Agent run them as tools.

## Overview

Scripts are stored in `.mux/scripts/` within each workspace. They serve two purposes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should consider using our standard paradigm of coalescing ~/.mux with project .mux. Should also consider suggesting symlink here.


1. **Human Use**: Executable via `/script <name>` or `/s <name>` in chat.
2. **Agent Use**: Automatically exposed to the AI as tools (`script_<name>`), allowing the agent to run complex workflows you define.

Scripts run in the workspace directory with full access to project secrets and environment variables.

**Key Point**: Scripts are workspace-specific. Each workspace has its own custom toolkit defined in `.mux/scripts/`.

## Creating Scripts

1. **Create the scripts directory**:

```bash
mkdir -p .mux/scripts
```

2. **Add an executable script**:

```bash
#!/usr/bin/env bash
# Description: Deploy to staging. Accepts one optional argument: 'dry-run' to simulate.

if [ "${1:-}" == "dry-run" ]; then
echo "Simulating deployment..."
else
echo "Deploying to staging..."
fi
```

**Crucial**: The `# Description:` line is what the AI reads to understand the tool. Be descriptive about what the script does and what arguments it accepts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, but it's a little bit unclear how it would work for non-bash scripts, which should definitely be supported.

One idea is calling --help on all scripts upon load, but it would be cumbersome to implement for users. Another idea is automatically including the first N lines of the script, but this is "magical" and would fail for binary scripts.

Some more ideas:

  • A manifest.jsonc in the scripts dir containing a mapping of scripts to descriptions
  • A standard <script>.help file that we look at for description

We could also implement comment parsing for common languages.

Nothing clean comes to mind unfortunately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. i believe we have a .ts script for updating models. On iPad rn so can't easily check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should support both binaries and non-bash scripts.

For Python and TypeScript, this approach should work well, as we can read these files as text. Since our files include a shebang at the top, it's also feasible to add a description right below the shebang. So the current approach should also work for the .ts script for updating models.

For binaries, we can introduce a new heuristic in the future: obtain descriptions using the --help flag or $CMD help, and incorporate that information into the script's description.


3. **Make it executable**:

```bash
chmod +x .mux/scripts/deploy
```

## Agent Integration (AI Tools)

Every executable script in `.mux/scripts/` is automatically registered as a tool for the AI Agent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious for rationale of this vs. using a single script tool with the scripts registered in the description. Historically, tool call performance has suffered as the agent is provided with too many tools. This may be different now.

Another issue with a tool for each script is a lot of duplicate token usage for tool description. For example, each script has the same input and output parameters (argv, stdout, exit_code).


- **Tool Name**: `script_<name>` (e.g., `deploy` -> `script_deploy`, `run-tests` -> `script_run_tests`)
- **Tool Description**: Taken from the script's header comment (`# Description: ...`).
- **Arguments**: The AI can pass an array of string arguments to the script.

### Optimization for AI

To make your scripts effective AI tools:

1. **Clear Descriptions**: Explicitly state what the script does and what arguments it expects.

```bash
# Description: Fetch logs. Requires one argument: the environment name (dev|prod).
```

2. **Robustness**: Use `set -euo pipefail` to ensure the script fails loudly if something goes wrong, allowing the AI to catch the error.
3. **Clear Output**: Write structured output to stdout so the agent can understand results and take action.

## Usage

### Basic Execution

Type `/s` or `/script` in chat to see available scripts with auto-completion:

```
/s deploy
```

### With Arguments

Pass arguments to scripts:

```
/s deploy --dry-run
/script test --verbose --coverage
```

Arguments are passed directly to the script as `$1`, `$2`, etc.

## Execution Context

Scripts run with:

- **Working directory**: The workspace directory.
- **Environment**: Full workspace environment + project secrets + special cmux variables.
- **Timeout**: 5 minutes by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would fail for wait_pr_checks generally - suggest not having a timeout or making it longer. User should be able to interrupt hanging script anyways. we should also eventually loop this into our background bash system so that the agent can poll output and do useful work while waiting + cancel an apparently hanging script.

- **Streams**: stdout/stderr are captured.
- **Human**: Visible in the chat card.
- **Agent**: Returned as the tool execution result.

### Standard Streams

Scripts follow Unix conventions for output:

- **stdout**: Sent to the agent as the tool result. Use this for structured output the agent should act on.
- **stderr**: Shown to the user in the UI but **not** sent to the agent. Use this for progress messages, logs, or debugging info that doesn't need AI attention.

This design means scripts work identically whether run inside mux or directly from the command line.

#### Example: Test Runner

```bash
#!/usr/bin/env bash
# Description: Run tests and report failures for the agent to fix

set -euo pipefail

# Progress to stderr (user sees it, agent doesn't)
echo "Running test suite..." >&2

if npm test > test.log 2>&1; then
# Success message to stdout (agent sees it)
echo "✅ All tests passed"
else
# Structured failure info to stdout (agent sees and can act on it)
cat << EOF
❌ Tests failed. Here is the log:
\`\`\`
$(cat test.log)
\`\`\`
Please analyze this error and propose a fix.
EOF
exit 1
fi
```

**Result**:

1. User sees "Running test suite..." progress message.
2. On failure, agent receives the structured error with test log and instructions.
3. Agent can immediately analyze and propose fixes.

## Example Scripts

### Deployment Script

```bash
#!/usr/bin/env bash
# Description: Deploy application. Accepts one arg: environment (default: staging).
set -euo pipefail

ENV=${1:-staging}
echo "Deploying to $ENV..."
# ... deployment logic ...
echo "Deployment complete!"
```

### Web Fetch Utility

```bash
#!/usr/bin/env bash
# Description: Fetch a URL. Accepts exactly one argument: the URL.
set -euo pipefail

if [ $# -ne 1 ]; then
echo "Usage: $0 <url>"
exit 1
fi
curl -sL "$1"
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I added a web_fetch tool directly we should remove this example to avoid confusion. Unfortunately web_fetch is not well represented as a bash script but could be as a typescript script since we want to filter out a lot of the noise in the HTML and extract just the textual content.


## Script Discovery

- Scripts are discovered automatically from `.mux/scripts/` in the current workspace.
- Discovery is cached for performance but refreshes intelligently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refresh model should be more clearly explained. E.g. should the user expect any edits to the scripts to be immediately reflected or do they have to wait for some timer?

Also, it's not totally clear whether we load scripts from the local project dir or the workspace dir. This determines whether we can easily plug into inotify (projects are always local and we can bypass runtime interface)

- **Sanitization**: Script names are sanitized for tool use (e.g., `my-script.sh` -> `script_my_script_sh`).

## Troubleshooting

**Script not appearing in suggestions or tools?**

- Ensure file is executable: `chmod +x .mux/scripts/scriptname`
- Verify file is in `.mux/scripts/` directory.
- Check for valid description header.

**Agent using script incorrectly?**

- Improve the `# Description:` header. Explicitly tell the agent what arguments to pass.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could define extraction as the first non-shebang comment. IMO it should include multi-line comments.

18 changes: 18 additions & 0 deletions src/browser/App.stories.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,12 @@ function setupMockAPI(options: {
success: true,
data: { success: true, output: "", exitCode: 0, wall_duration_ms: 0 },
}),
listScripts: () => Promise.resolve({ success: true, data: [] }),
executeScript: () =>
Promise.resolve({
success: true,
data: { success: true, output: "", exitCode: 0, wall_duration_ms: 0 },
}),
},
projects: {
list: () => Promise.resolve(Array.from(mockProjects.entries())),
Expand Down Expand Up @@ -1255,6 +1261,12 @@ main
data: { success: true, output: "", exitCode: 0, wall_duration_ms: 0 },
});
},
listScripts: () => Promise.resolve({ success: true, data: [] }),
executeScript: () =>
Promise.resolve({
success: true,
data: { success: true, output: "", exitCode: 0, wall_duration_ms: 0 },
}),
},
},
});
Expand Down Expand Up @@ -1463,6 +1475,12 @@ These tables should render cleanly without any disruptive copy or download actio
success: true,
data: { success: true, output: "", exitCode: 0, wall_duration_ms: 0 },
}),
listScripts: () => Promise.resolve({ success: true, data: [] }),
executeScript: () =>
Promise.resolve({
success: true,
data: { success: true, output: "", exitCode: 0, wall_duration_ms: 0 },
}),
},
},
});
Expand Down
Loading
Loading