🤖 feat: add Ollama local model support #531

ThomasK33 · 2025-11-07T18:04:45Z

Adds Ollama provider support for running local LLMs (e.g., ollama:llama3.2, ollama:qwen2.5-coder) with full tool calling and streaming support.

Setup

Install Ollama from ollama.com
Pull a model: ollama pull gpt-oss:20b
That's it! Works out-of-the-box with no configuration needed.

Optional: Configure custom URL in ~/.cmux/providers.jsonc:

{
  "ollama": {
    "baseUrl": "http://your-server:11434/api"
  }
}

Key Changes

Model string parsing handles Ollama format (ollama:model-id:tag)
Integration with ollama-ai-provider-v2 from Vercel AI SDK
No configuration required - defaults to http://localhost:11434/api
4 integration tests with CI support (gated by TEST_OLLAMA=1)
Tokenizer support for common Ollama models

Tests

✅ 4 new integration tests (102s)
✅ 964 unit tests pass
✅ All CI checks pass

Generated with cmux

Integrates ollama-ai-provider-v2 to enable running AI models locally through Ollama without requiring API keys. Changes: - Add ollama-ai-provider-v2 dependency - Implement Ollama provider in aiService.ts with lazy loading - Add OllamaProviderOptions type for future extensibility - Support Ollama model display formatting (e.g., llama3.2:7b -> Llama 3.2 (7B)) - Update providers.jsonc template with Ollama configuration example - Add comprehensive Ollama documentation to models.md - Add unit tests for Ollama model name formatting Ollama is a local service that doesn't require API keys. Users can run any model from the Ollama Library (https://ollama.com/library) locally. Example configuration in ~/.cmux/providers.jsonc: { "ollama": { "baseUrl": "http://localhost:11434" } } Example model usage: ollama:llama3.2:7b _Generated with `cmux`_

Adds comprehensive integration tests for Ollama provider to verify tool calling and file operations work correctly with local models. Changes: - Add tests/ipcMain/ollama.test.ts with 4 test cases: * Basic message sending and response * Tool calling with bash tool (gpt-oss:20b) * File operations with file_read tool * Error handling when Ollama is not running - Update setupWorkspace() to handle Ollama (no API key required) - Update setupProviders() type signature for optional baseUrl - Add Ollama installation and model pulling to CI workflow - Configure CI to run Ollama tests with gpt-oss:20b model The tests verify that Ollama can: - Send messages and receive streaming responses - Execute bash commands via tool calling - Read files using the file_read tool - Handle connection errors gracefully CI Setup: - Installs Ollama via official install script - Pulls gpt-oss:20b model for tests - Waits for Ollama service to be ready before running tests - Sets OLLAMA_BASE_URL environment variable for tests _Generated with `cmux`_

Cache Ollama models between CI runs to speed up integration tests. The gpt-oss:20b model can be large, so caching saves significant time on subsequent test runs. Cache key: ${{ runner.os }}-ollama-gpt-oss-20b-v1 _Generated with `cmux`_

- Make cache keys more generic and future-proof - Cache Ollama binary separately for instant cached runs - Update model examples to popular models (gpt-oss, qwen3-coder) Changes: - Split Ollama caching into binary + models for better performance - Only install Ollama if binary is not cached (saves time) - Update docs to reference gpt-oss:20b, gpt-oss:120b, qwen3-coder:30b _Generated with `cmux`_

- Fixed model string parsing to handle colons in model IDs (e.g., ollama:gpt-oss:20b) Split only on first colon instead of all colons - Added Ollama compatibility mode (strict) for better API compatibility - Fixed baseURL configuration to include /api suffix consistently Updated test setup, config template, docs, and CI - Fixed test assertions to use extractTextFromEvents() helper Tests were incorrectly calling .join() on event objects instead of extracting delta text - Removed test concurrency to prevent race conditions Sequential execution resolves stream-end event timing issues - Updated file operations test to use README.md instead of package.json More reliable for test workspace environment All 4 Ollama integration tests now pass consistently (102s total runtime)

- Remove unused modelString import from ollama.test.ts - Use consistent indexOf() pattern for provider extraction in streamMessage() Ensures model IDs with colons are handled uniformly throughout codebase

The 'before' variable was previously used for debug logging but is no longer needed

Key improvements: - Combined binary, library, and model caching into single cache entry Previously: separate caches for binary and models Now: /usr/local/bin/ollama + /usr/local/lib/ollama + /usr/share/ollama - Fixed model cache path from ~/.ollama/models to /usr/share/ollama Models are stored in system ollama user's home, not runner's home - Separated installation from server startup Install step only runs on cache miss and includes model pull Startup step always runs but completes in <5s with cached models - Optimized readiness checks Install: 10s timeout, 0.5s polling (only on cache miss) Startup: 5s timeout, 0.2s polling (every run, with cache hit) - Added cache key based on workflow file hash Cache invalidates when workflow changes, ensuring fresh install if needed Expected timing: - First run (cache miss): ~60s (download + install + model pull) - Subsequent runs (cache hit): <5s (just server startup) - Cache size: ~13GB (gpt-oss:20b model) Testing: Verified locally that Ollama starts in <1s with cached models

Fixes context limit display for Ollama models like ollama:gpt-oss:20b. Problem: - User model string: ollama:gpt-oss:20b - Previous lookup: gpt-oss:20b (stripped provider) - models.json key: ollama/gpt-oss:20b-cloud (LiteLLM convention) - Result: Lookup failed, showed "Unknown model limits" Solution: Implemented multi-pattern fallback lookup that tries: 1. Direct model name (claude-opus-4-1) 2. Provider-prefixed (ollama/gpt-oss:20b) 3. Cloud variant (ollama/gpt-oss:20b-cloud) ← matches! 4. Base model (ollama/gpt-oss) as fallback Benefits: - Works automatically for all Ollama models in models.json - Zero configuration required - Backward compatible with existing lookups - No API calls needed (works offline) Testing: - Added 15+ unit tests covering all lookup patterns - Verified ollama:gpt-oss:20b → 131k context limit - All 979 unit tests pass Models that now work: - ollama:gpt-oss:20b → ollama/gpt-oss:20b-cloud (131k) - ollama:gpt-oss:120b → ollama/gpt-oss:120b-cloud (131k) - ollama:llama3.1 → ollama/llama3.1 (8k) - ollama:deepseek-v3.1:671b → ollama/deepseek-v3.1:671b-cloud - Plus all existing Anthropic/OpenAI models

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

.github/workflows/ci.yml

- Download only the Ollama binary (no system service installation) - Use 'ollama start' instead of 'ollama serve' - Cache binary and models separately for better cache efficiency - Models now naturally go to ~/.ollama (no sudo/copying needed) - Removed complex model copying logic from cache miss path - Simplified workflow - Ollama server starts in setup action Benefits: - Cache works correctly (models in user home, not system location) - Faster warm cache (<1s vs ~60s) - No sudo operations needed - Matches proven pydantic/ollama-action approach

Previously, setup-ollama action pulled models sequentially during setup. Now tests pull models idempotently in beforeAll hook, enabling: - Better parallelism across test jobs - Idempotent model pulls (multiple tests can check/pull safely) - Shared model cache across parallel test runners - Ollama handles deduplication when multiple pulls happen simultaneously Changes: - Remove model input and pull logic from setup-ollama action - Add ensureOllamaModel() helper to check if model exists and pull if needed - Call ensureOllamaModel() in beforeAll hook before tests run - Bump beforeAll timeout to 150s to accommodate potential model pull - Simplify cache key to 'ollama-models-v2' (model-agnostic) _Generated with `cmux`_

ammar-agent · 2025-11-08T22:52:23Z

✅ Refactoring Complete: Test-Initiated Model Pulls

Successfully refactored Ollama setup to improve parallelism by moving model pulls from the setup action into the test suite itself.

Changes in ab90e9b

Setup Action (.github/actions/setup-ollama/action.yml)

❌ Removed model input parameter
❌ Removed sequential model pull logic
✅ Simplified cache key to ollama-models-v2 (model-agnostic)
✅ Binary cache: still works (~1.4 GB, 3s restore)
✅ Model cache: now shared across all models

Integration Tests (tests/ipcMain/ollama.test.ts)

✅ Added ensureOllamaModel() helper function
- Checks if model exists via ollama list
- Pulls only if needed (idempotent)
- Multiple tests can call safely - Ollama handles deduplication
✅ Called in beforeAll hook before tests run
✅ Bumped timeout to 150s to accommodate potential model pull

Benefits

Better parallelism - Multiple test jobs can now share the cached model directory
Idempotent pulls - Tests check if model exists before pulling
Ollama-native deduplication - If multiple processes try to pull, Ollama handles it
Simplified setup action - No longer needs to know about specific models
Shared cache - All models go into one cache key, improving cache efficiency

Test Results (Run 19199669727)

✅ Binary cache hit - 1.4 GB restored in ~3s
✅ Model pull during test - 13 GB gpt-oss:20b pulled successfully
✅ All Ollama tests passed - 4/4 tests in 134s
✅ All CI checks passed - Static checks, unit tests, integration tests, E2E tests

Performance

First run (cold cache):

Binary cache: hit ⚡
Model cache: miss (expected with new cache key)
Model pull: ~35s during test
Total setup: ~40s (including test overhead)

Subsequent runs (warm cache):

Will hit both binary and model cache
ensureOllamaModel() detects model exists: ~1s
Expected setup: <5s ⚡

Generated with cmux

The keyboard shortcut is faster and more convenient than navigating through the Command Palette. _Generated with `cmux`_

ammar-agent · 2025-11-09T16:35:08Z

✅ Updated docs to recommend Ctrl+/ / Cmd+/ keyboard shortcut for model switching (more convenient than Command Palette).

Commit: 82b51a2

Generated with cmux

Consolidates two instances of provider name extraction logic into a single helper function. Both createModel() and streamMessage() were duplicating the logic to parse provider names from model strings. The helper properly handles Ollama model IDs with colons (e.g., "ollama:gpt-oss:20b" -> "ollama") by splitting only on the first colon. _Generated with `cmux`_

ammar-agent · 2025-11-09T16:37:16Z

✅ Refactored provider name extraction to eliminate code duplication.

Created parseProviderName() helper function that consolidates two instances of the same logic:

createModel() - was using modelString.slice(0, colonIndex)
streamMessage() - was using colonIndex !== -1 ? modelString.slice(0, colonIndex) : modelString

The helper properly handles Ollama model IDs with colons (e.g., ollama:gpt-oss:20b → ollama).

Commit: e90b881

Generated with cmux

ammar-agent · 2025-11-09T16:38:33Z

✅ Further improved refactoring - parseModelString now returns provider + model tuple.

Changed from parseProviderName() returning just the provider to parseModelString() returning [providerName, modelId] tuple. This eliminates the remaining duplication:

❌ Removed: modelString.slice(colonIndex + 1) in createModel()
❌ Removed: modelString.indexOf(":") redundant checks

Both call sites now use clean destructuring:

const [providerName, modelId] = parseModelString(modelString);
const [providerName] = parseModelString(modelString); // when only provider needed

Commit: 6ce28ca

Generated with cmux

Further eliminates duplication by having parseModelString return both the provider name and model ID as a tuple [providerName, modelId]. This removes the remaining duplicated logic: - modelString.slice(colonIndex + 1) in createModel() - modelString.indexOf(":") check logic Both call sites now use destructuring to get the parts they need. _Generated with `cmux`_

Add blank lines before bullet lists per prettier rules. _Generated with `cmux`_

ammario · 2025-11-09T17:26:47Z

Note: I'm setting up an OVH CI runner for the integration tests. That should generally speed things up, but is particularly necessary for this change as there's no reasonable way to avoid ~40s for the Ollama model pull.

Separates Ollama integration tests from main integration test suite: **New Job: ollama-test** - Dedicated job for Ollama-specific tests - Sets up Ollama binary and model cache - Runs only tests/ipcMain/ollama.test.ts - Uploads coverage with 'ollama-tests' flag **Updated Job: integration-test** - Removed Ollama setup steps - Excludes ollama.test.ts via --testPathIgnorePatterns - Removed OLLAMA_BASE_URL env var - Faster execution without Ollama dependencies Benefits: - Better parallelism (Ollama tests run independently) - Clearer separation of concerns - Main integration tests complete faster - Ollama-specific caching isolated to one job _Generated with `cmux`_

Simplifies CI by using an explicit environment variable instead of path filtering: **Test Changes:** - Ollama tests now require both TEST_INTEGRATION=1 and TEST_OLLAMA=1 - Uses `describeOllama` that checks `process.env.TEST_OLLAMA === '1'` - Auto-skips when TEST_OLLAMA is not set (no manual filtering needed) **CI Changes:** - `integration-test` job: runs all tests, Ollama tests skip automatically - `ollama-test` job: sets TEST_OLLAMA=1 to enable Ollama tests - Removed `--testPathIgnorePatterns` (no longer needed) - Cleaner and more explicit test gating Benefits: - Simpler CI configuration (no path filtering) - Consistent pattern with TEST_INTEGRATION - Easy to run Ollama tests locally: TEST_INTEGRATION=1 TEST_OLLAMA=1 bun x jest tests/ipcMain/ollama.test.ts _Generated with `cmux`_

ammar-agent · 2025-11-09T22:38:20Z

✅ Simplified Ollama test gating with TEST_OLLAMA environment variable.

Changes:

Ollama tests now require explicit TEST_OLLAMA=1 to run (similar to TEST_INTEGRATION)
Removed Jest path filtering (--testPathIgnorePatterns)
Tests automatically skip when TEST_OLLAMA is not set

CI Jobs:

integration-test: Runs all tests except Ollama (auto-skipped)
ollama-test: Sets TEST_OLLAMA=1 to enable Ollama tests

Local Testing:

# Run Ollama tests locally
TEST_INTEGRATION=1 TEST_OLLAMA=1 bun x jest tests/ipcMain/ollama.test.ts

Much cleaner than path-based filtering! 🎉

Commits: de39340 + 94cab2c

Generated with cmux

ammar-agent · 2025-11-09T22:44:27Z

🎉 PR Ready to Merge!

All checks passed with the new split Ollama test job working perfectly!

Final Architecture

CI Jobs:

Integration Tests (3m43s) - Runs all integration tests except Ollama
Ollama Integration Tests (3m46s) - Runs only Ollama tests in parallel

Both jobs run in parallel, achieving the parallelism goal! ✨

Test Gating Pattern

// tests/ipcMain/ollama.test.ts
const shouldRunOllamaTests = shouldRunIntegrationTests() && process.env.TEST_OLLAMA === '1';
const describeOllama = shouldRunOllamaTests ? describe : describe.skip;

Summary of All Changes

Test-initiated model pulls (ab90e9b) - Models pulled idempotently in beforeAll hook
Documentation (82b51a2) - Recommend Ctrl+/ shortcut
Code quality (e90b881 + f671d7a) - parseModelString helper eliminates duplication
Split Ollama job (de39340 + 94cab2c) - Separate job with TEST_OLLAMA env var

Status: ✅ MERGEABLE with CLEAN merge state

Generated with cmux

Quiets verbose output from Ollama integration tests: **CI Changes:** - Added `--silent` flag to jest command (suppresses per-test output) **Test Changes:** - Removed `console.log` statements from `ensureOllamaModel()` - Changed stdio from 'inherit' to 'pipe' to capture output silently - Still capture stderr for error reporting if pull fails - Add explanatory comments about silent mode This dramatically reduces CI log verbosity while maintaining error visibility. _Generated with `cmux`_

ammar-agent · 2025-11-09T23:12:32Z

✅ Fixed Ollama test log spam!

Before:

Verbose console.log statements on every test
Progress bars from ollama pull (inherited stdio)
Tokenizer warnings repeated for each test

After:

Clean test output with --silent flag
No console.log spam (removed explicit logs)
Captured stderr for error reporting only
Still visible: Test pass/fail results and coverage

The CI logs are now much cleaner while maintaining visibility of actual test results! 🎉

Commit: c5305ee

Generated with cmux

Reduces log spam in CI by mocking console.log and console.warn during Ollama integration tests. The test output was showing 426+ console statements including: - Tokenizer warnings for unknown Ollama models - Service logs from initStateManager and aiService - Tool configuration logs Changes: - Add console.log/console.warn spies in beforeAll (CI only) - Restore console in afterAll - Only active when process.env.CI is set This makes CI logs readable while preserving local dev debugging. _Generated with `cmux`_

Fixes root cause of Ollama test log spam (426+ console warnings). **Root Cause:** The tokenizer warned EVERY time it encountered `ollama:gpt-oss:20b` because it's not in the model registry. With 4 tests making multiple API calls each, this created 426+ duplicate warnings. **Solution:** Track warned models in a Set and only warn once per unknown model. **Changes:** - Add `warnedModels` Set to track which models we've warned about - Check Set before warning to deduplicate - Revert console mocking from test (no longer needed) **Impact:** - Ollama tests: 426+ warnings → 1 warning - Production: Users won't see warning spam for Ollama models - Cleaner approach than test-specific mocking _Generated with `cmux`_

Ollama provider works out-of-the-box with no configuration needed - it defaults to http://localhost:11434/api when no config is provided. Updated docs and code comments to make this clear: - Setup now shows 2 steps instead of 3 (no config needed) - Added "Custom Configuration" section for remote/custom URLs - Updated provider config example to show ollama is optional - Fixed code comment in config.ts to note optional nature This matches actual code behavior where providerConfig defaults to {} and createOllama() uses its built-in defaults.

ammar-agent added 3 commits November 8, 2025 01:46

🤖 ci: add caching for Ollama models

a268216

Cache Ollama models between CI runs to speed up integration tests. The gpt-oss:20b model can be large, so caching saves significant time on subsequent test runs. Cache key: ${{ runner.os }}-ollama-gpt-oss-20b-v1 _Generated with `cmux`_

ammar-agent force-pushed the ollama branch from 9eb8723 to a268216 Compare November 8, 2025 01:55

ammar-agent added 4 commits November 8, 2025 02:00

🤖 fix: format docs/models.md

472270c

🤖 style: format ollama test file

6d48ecf

ammar-agent changed the title ~~🤖 feat: Add first-class Ollama support (auto-detect, auto-start, use Vercel AI SDK)~~ 🤖 feat: add Ollama local model support with Vercel AI SDK integration Nov 8, 2025

ammario changed the title ~~🤖 feat: add Ollama local model support with Vercel AI SDK integration~~ 🤖 feat: add Ollama local model support Nov 8, 2025

ammar-agent added 4 commits November 8, 2025 16:22

🤖 refactor: cleanup and consistency improvements

5df1cf8

- Remove unused modelString import from ollama.test.ts - Use consistent indexOf() pattern for provider extraction in streamMessage() Ensures model IDs with colons are handled uniformly throughout codebase

🤖 refactor: remove unused variable from EventCollector

4cd2491

The 'before' variable was previously used for debug logging but is no longer needed

ammar-agent force-pushed the ollama branch from 73041ff to 1b577db Compare November 8, 2025 16:49

ammario marked this pull request as ready for review November 8, 2025 16:52

chatgpt-codex-connector bot reviewed Nov 8, 2025

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

ammar-agent added 13 commits November 8, 2025 17:06

🤖 perf: use stable cache key for Ollama (v3 without workflow hash)

f997fbe

🤖 ci: trigger CI after resolving Codex comment

872c6df

🤖 fix: set permissions on Ollama directory for cache saving

7fa5c47

🤖 ci: test warm cache after v3 saved

5a4978e

🤖 refactor: cache models in /tmp to avoid permission issues

c7b245d

🤖 refactor: simplify Ollama setup (remove caching complexity)

09015ae

🤖 refactor: create setup-ollama action with caching

c10ffcb

🤖 fix: properly stop Ollama process after model pull

4db87ae

🤖 fix: use absolute path for Ollama cache (~ doesn't expand)

fecacc0

🤖 debug: add directory listing to verify cache contents

87a76a7

🤖 debug: check both possible model locations

ffeec29

🤖 debug: bump cache version to v3 for fresh test

2cc309c

🤖 debug: remove restore-keys to force cache miss

dfaa011

ammar-agent added 4 commits November 8, 2025 21:25

🤖 fix: copy models from system location to cacheable location

75d6c05

🤖 fix: bump cache version to v2 to invalidate empty cache

f82f5a7

🤖 docs: recommend Ctrl+/ shortcut for model switching

82b51a2

The keyboard shortcut is faster and more convenient than navigating through the Command Palette. _Generated with `cmux`_

ammar-agent added 2 commits November 9, 2025 16:38

🤖 style: fix prettier formatting in docs/models.md

c115295

Add blank lines before bullet lists per prettier rules. _Generated with `cmux`_

ammario self-assigned this Nov 9, 2025

ammar-agent added 2 commits November 9, 2025 22:36

ammar-agent added 3 commits November 9, 2025 23:13

ammario approved these changes Nov 9, 2025

View reviewed changes

ammario merged commit 3202b13 into main Nov 9, 2025
16 checks passed

ammario deleted the ollama branch November 9, 2025 23:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 feat: add Ollama local model support #531

🤖 feat: add Ollama local model support #531

ThomasK33 commented Nov 7, 2025 •

edited by ammar-agent

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

ammar-agent commented Nov 8, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

ammario commented Nov 9, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

🤖 feat: add Ollama local model support #531

🤖 feat: add Ollama local model support #531

Conversation

ThomasK33 commented Nov 7, 2025 • edited by ammar-agent Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Setup

Key Changes

Tests

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ammar-agent commented Nov 8, 2025

✅ Refactoring Complete: Test-Initiated Model Pulls

Changes in ab90e9b

Benefits

Test Results (Run 19199669727)

Performance

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

ammario commented Nov 9, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

ammar-agent commented Nov 9, 2025

🎉 PR Ready to Merge!

Final Architecture

Test Gating Pattern

Summary of All Changes

Uh oh!

ammar-agent commented Nov 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ThomasK33 commented Nov 7, 2025 •

edited by ammar-agent

Loading