(FOR CI) Test/router load unload #48

ngxson · 2025-12-03T10:05:08Z

Make sure to read the contributing guidelines before submitting a PR

Add 4 test cases for model router: - test_router_unload_model: explicit model unloading - test_router_models_max_evicts_lru: LRU eviction with --models-max - test_router_no_models_autoload: --no-models-autoload flag behavior - test_router_api_key_required: API key authentication Tests use async model loading with polling and graceful skip when insufficient models available for eviction testing. utils.py changes: - Add models_max, models_dir, no_models_autoload attributes to ServerProcess - Handle JSONDecodeError for non-JSON error responses (fallback to text)

coderabbitai · 2025-12-03T10:05:22Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ServeurpersoCom · 2025-12-03T12:45:37Z

------------------------------------------------------------------- Captured stdout teardown -------------------------------------------------------------------
Stopping server with pid=2103598
=================================================================== short test summary info ====================================================================
FAILED unit/test_router.py::test_router_models_max_evicts_lru - AssertionError: Timed out waiting for ggml-org/tinygemma3-GGUF:Q8_0 to reach {'loaded'}, last status: unloaded
============================================================ 1 failed, 5 passed in 81.58s (0:01:21) ============================================================
(root|~/llama.cpp.pascal) git diff
diff --git a/tools/server/public/index.html.gz b/tools/server/public/index.html.gz
index b911b6e76..7c3667f9a 100644
Binary files a/tools/server/public/index.html.gz and b/tools/server/public/index.html.gz differ
diff --git a/tools/server/tests/unit/test_router.py b/tools/server/tests/unit/test_router.py
index 87710d511..fb304386c 100644
--- a/tools/server/tests/unit/test_router.py
+++ b/tools/server/tests/unit/test_router.py
@@ -3,6 +3,11 @@ from utils import *

 server: ServerProcess

+@pytest.fixture(scope="module", autouse=True)
+def preload_models():
+    # Preload all models before running router tests
+    ServerPreset.load_all()
+
 @pytest.fixture(autouse=True)
 def create_server():
     global server
@@ -17,7 +22,6 @@ def create_server():
     ]
 )
 def test_router_chat_completion_stream(model: str, success: bool):
-    # TODO: make sure the model is in cache (ie. ServerProcess.load_all()) before starting the router server
     global server
     server.start()
     content = ""
diff --git a/tools/server/tests/utils.py b/tools/server/tests/utils.py
index 5417bb278..f794523b6 100644
--- a/tools/server/tests/utils.py
+++ b/tools/server/tests/utils.py
@@ -450,6 +450,7 @@ class ServerPreset:
     @staticmethod
     def tinyllama2() -> ServerProcess:
         server = ServerProcess()
+        server.offline = True # will be downloaded by load_all()
         server.model_hf_repo = "ggml-org/test-model-stories260K"
         server.model_hf_file = None
         server.model_alias = "tinyllama-2"
(root|~/llama.cpp.pascal)

Fail also

Fix eviction test: load 2 models first, verify state, then load 3rd to trigger eviction. Previous logic loaded all 3 at once, causing first model to be evicted before verification could occur. Add module fixture to preload models via ServerPreset.load_all() and mark test presets as offline to use cached models

ngxson mentioned this pull request Dec 3, 2025

llama-server: add router multi-model tests (#17704) ggml-org/llama.cpp#17722

Merged

github-actions bot added examples python server labels Dec 3, 2025

llama-server: update test models to new HF repos

3905449

ServeurpersoCom force-pushed the test/router-load-unload branch from 1eabe34 to 3905449 Compare December 3, 2025 10:17

add offline

ec7dc2e

ServeurpersoCom added 2 commits December 3, 2025 14:13

llama-server: fix split model download on Windows

583463b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(FOR CI) Test/router load unload #48

(FOR CI) Test/router load unload #48

ngxson commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 3, 2025

Review skipped

Uh oh!

ServeurpersoCom commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

(FOR CI) Test/router load unload #48

Are you sure you want to change the base?

(FOR CI) Test/router load unload #48

Conversation

ngxson commented Dec 3, 2025

Uh oh!

coderabbitai bot commented Dec 3, 2025

Review skipped

Uh oh!

ServeurpersoCom commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants