Skip to content

Conversation

@ngxson
Copy link
Owner

@ngxson ngxson commented Dec 3, 2025

Make sure to read the contributing guidelines before submitting a PR

Add 4 test cases for model router:
- test_router_unload_model: explicit model unloading
- test_router_models_max_evicts_lru: LRU eviction with --models-max
- test_router_no_models_autoload: --no-models-autoload flag behavior
- test_router_api_key_required: API key authentication

Tests use async model loading with polling and graceful skip when
insufficient models available for eviction testing.

utils.py changes:
- Add models_max, models_dir, no_models_autoload attributes to ServerProcess
- Handle JSONDecodeError for non-JSON error responses (fallback to text)
@coderabbitai
Copy link

coderabbitai bot commented Dec 3, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@ServeurpersoCom ServeurpersoCom force-pushed the test/router-load-unload branch from 1eabe34 to 3905449 Compare December 3, 2025 10:17
@ServeurpersoCom
Copy link

------------------------------------------------------------------- Captured stdout teardown -------------------------------------------------------------------
Stopping server with pid=2103598
=================================================================== short test summary info ====================================================================
FAILED unit/test_router.py::test_router_models_max_evicts_lru - AssertionError: Timed out waiting for ggml-org/tinygemma3-GGUF:Q8_0 to reach {'loaded'}, last status: unloaded
============================================================ 1 failed, 5 passed in 81.58s (0:01:21) ============================================================
(root|~/llama.cpp.pascal) git diff
diff --git a/tools/server/public/index.html.gz b/tools/server/public/index.html.gz
index b911b6e76..7c3667f9a 100644
Binary files a/tools/server/public/index.html.gz and b/tools/server/public/index.html.gz differ
diff --git a/tools/server/tests/unit/test_router.py b/tools/server/tests/unit/test_router.py
index 87710d511..fb304386c 100644
--- a/tools/server/tests/unit/test_router.py
+++ b/tools/server/tests/unit/test_router.py
@@ -3,6 +3,11 @@ from utils import *

 server: ServerProcess

+@pytest.fixture(scope="module", autouse=True)
+def preload_models():
+    # Preload all models before running router tests
+    ServerPreset.load_all()
+
 @pytest.fixture(autouse=True)
 def create_server():
     global server
@@ -17,7 +22,6 @@ def create_server():
     ]
 )
 def test_router_chat_completion_stream(model: str, success: bool):
-    # TODO: make sure the model is in cache (ie. ServerProcess.load_all()) before starting the router server
     global server
     server.start()
     content = ""
diff --git a/tools/server/tests/utils.py b/tools/server/tests/utils.py
index 5417bb278..f794523b6 100644
--- a/tools/server/tests/utils.py
+++ b/tools/server/tests/utils.py
@@ -450,6 +450,7 @@ class ServerPreset:
     @staticmethod
     def tinyllama2() -> ServerProcess:
         server = ServerProcess()
+        server.offline = True # will be downloaded by load_all()
         server.model_hf_repo = "ggml-org/test-model-stories260K"
         server.model_hf_file = None
         server.model_alias = "tinyllama-2"
(root|~/llama.cpp.pascal)

Fail also

Fix eviction test: load 2 models first, verify state, then load
3rd to trigger eviction. Previous logic loaded all 3 at once,
causing first model to be evicted before verification could occur.

Add module fixture to preload models via ServerPreset.load_all()
and mark test presets as offline to use cached models
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants