Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
158 commits
Select commit Hold shift + click to select a range
fc5901a
server: add model management and proxy
ngxson Nov 19, 2025
399f536
fix compile error
ngxson Nov 19, 2025
abc0ca4
does this fix windows?
ngxson Nov 19, 2025
54b3545
fix windows build
ngxson Nov 19, 2025
5423d42
use subprocess.h, better logging
ngxson Nov 19, 2025
0ef3b61
add test
ngxson Nov 19, 2025
7c6eb17
fix windows
ngxson Nov 20, 2025
919d3f8
Merge branch 'master' into xsn/server_model_management_v1_2
ngxson Nov 20, 2025
55d33a8
feat: Model/Router server architecture WIP
allozaur Nov 20, 2025
b9ebdf6
more stable
ngxson Nov 20, 2025
6610724
fix unsafe pointer
ngxson Nov 20, 2025
d0ea9e0
also allow terminate loading model
ngxson Nov 20, 2025
5805ca7
add is_active()
ngxson Nov 20, 2025
8a88576
refactor: Architecture improvements
allozaur Nov 20, 2025
c35dee3
Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2…
allozaur Nov 20, 2025
2161408
tmp apply upstream fix
ngxson Nov 20, 2025
5369aaa
address most problems
ngxson Nov 20, 2025
6929c9f
address thread safety issue
ngxson Nov 20, 2025
be25bcc
address review comment
ngxson Nov 20, 2025
cd5c699
add docs (first version)
ngxson Nov 20, 2025
a2e912c
address review comment
ngxson Nov 20, 2025
4bf82a1
feat: Improved UX for model information, modality interactions etc
allozaur Nov 20, 2025
cc88f6a
chore: update webui build output
allozaur Nov 20, 2025
45bf2a4
Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2…
allozaur Nov 21, 2025
049f40d
refactor: Use only the message data `model` property for displaying m…
allozaur Nov 21, 2025
c26c340
chore: update webui build output
allozaur Nov 21, 2025
032b9ff
add --models-dir param
ngxson Nov 21, 2025
8b1d967
feat: New Model Selection UX WIP
allozaur Nov 21, 2025
6b7c0a5
chore: update webui build output
allozaur Nov 21, 2025
69503aa
feat: Add auto-mic setting
allozaur Nov 21, 2025
92585c7
feat: Attachments UX improvements
allozaur Nov 21, 2025
62ee883
implement LRU
ngxson Nov 21, 2025
7cd9290
remove default model path
ngxson Nov 21, 2025
7241558
better --models-dir
ngxson Nov 21, 2025
b0540e8
add env for args
ngxson Nov 21, 2025
525e274
address review comments
ngxson Nov 21, 2025
457fbda
fix compile
ngxson Nov 21, 2025
c274f13
refactor: Chat Form Submit component
allozaur Nov 22, 2025
f2ca54b
Merge branch 'master' into xsn/server_model_management_v1_2
ngxson Nov 22, 2025
d32bbfe
ad endpoint docs
ngxson Nov 22, 2025
4af1b6c
Merge remote-tracking branch 'webui/allozaur/server_model_management_…
ngxson Nov 22, 2025
076eec6
feat: Add copy to clipboard to model name in model info dialog
allozaur Nov 22, 2025
db8ed5d
feat: Model unavailable UI state for model selector
allozaur Nov 22, 2025
dc913ec
feat: Chat Form Actions UI logic improvements
allozaur Nov 22, 2025
a39ef24
feat: Auto-select model from last assistant response
allozaur Nov 22, 2025
036cc93
chore: update webui build output
allozaur Nov 22, 2025
6282537
Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2…
allozaur Nov 22, 2025
f25bfab
expose args and exit_code in API
ngxson Nov 23, 2025
7ef6312
add note
ngxson Nov 23, 2025
f927e21
support extra_args on loading model
ngxson Nov 23, 2025
74685f4
allow reusing args if auto_load
ngxson Nov 23, 2025
f95f9c5
typo docs
ngxson Nov 23, 2025
2e355c7
oai-compat /models endpoint
ngxson Nov 23, 2025
5ad594e
cleaner
ngxson Nov 23, 2025
d65be91
address review comments
ngxson Nov 23, 2025
1f0cb3a
feat: Use `model` property for displaying the `repo/model-name` namin…
allozaur Nov 23, 2025
b7ba13b
refactor: Attachments data
allozaur Nov 23, 2025
48dbef1
chore: update webui build output
allozaur Nov 23, 2025
1c214e9
refactor: Enum imports
allozaur Nov 23, 2025
ef5f9d0
feat: Improve Model Selector responsiveness
allozaur Nov 23, 2025
49c8062
chore: update webui build output
allozaur Nov 23, 2025
d5a6671
refactor: Cleanup
allozaur Nov 23, 2025
f8ff39c
refactor: Cleanup
allozaur Nov 23, 2025
41764b8
refactor: Formatters
allozaur Nov 23, 2025
219fd19
chore: update webui build output
allozaur Nov 23, 2025
e92ce07
refactor: Copy To Clipboard Icon component
allozaur Nov 23, 2025
fb5445e
chore: update webui build output
allozaur Nov 23, 2025
39fb1c2
refactor: Cleanup
allozaur Nov 23, 2025
188d323
chore: update webui build output
allozaur Nov 23, 2025
16747de
refactor: UI badges
allozaur Nov 23, 2025
e808f2b
chore: update webui build output
allozaur Nov 23, 2025
76557cd
Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2…
allozaur Nov 23, 2025
13fe860
refactor: Cleanup
allozaur Nov 24, 2025
b2590a7
refactor: Cleanup
allozaur Nov 24, 2025
5ef3f99
chore: update webui build output
allozaur Nov 24, 2025
6ed192b
add --models-allow-extra-args for security
ngxson Nov 24, 2025
2c6b58f
nits
ngxson Nov 24, 2025
539cbf0
add stdin_file
ngxson Nov 24, 2025
399b39f
Merge branch 'master' into xsn/server_model_management_v1_2
ngxson Nov 24, 2025
e514b86
fix merge
ngxson Nov 24, 2025
11c26ec
Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2…
allozaur Nov 24, 2025
7db3d87
fix: Retrieve lost setting after resolving merge conflict
allozaur Nov 24, 2025
ccd6c27
refactor: DatabaseStore -> DatabaseService
allozaur Nov 25, 2025
fed6c82
refactor: Database, Conversations & Chat services + stores architectu…
allozaur Nov 25, 2025
f9c911d
refactor: Remove redundant settings
allozaur Nov 25, 2025
501badc
refactor: Multi-model business logic WIP
allozaur Nov 25, 2025
4c24ead
chore: update webui build output
allozaur Nov 25, 2025
b9a3129
feat: Switching models logic for ChatForm or when regenerating messge…
allozaur Nov 25, 2025
0132449
chore: update webui build output
allozaur Nov 25, 2025
82975a1
fix: Add `untrack` inside chat processing info data logic to prevent …
allozaur Nov 25, 2025
33356f3
fix: Regenerate
allozaur Nov 25, 2025
c680083
feat: Remove redundant settigns + rearrange
allozaur Nov 25, 2025
5207527
fix: Audio attachments
allozaur Nov 25, 2025
22507fe
refactor: Icons
allozaur Nov 25, 2025
81b8e1a
chore: update webui build output
allozaur Nov 25, 2025
2a280b6
feat: Model management and selection features WIP
allozaur Nov 26, 2025
19e5385
chore: update webui build output
allozaur Nov 26, 2025
b1cf8bb
refactor: Improve server properties management
allozaur Nov 26, 2025
23a91cd
refactor: Icons
allozaur Nov 26, 2025
d0d7a88
chore: update webui build output
allozaur Nov 26, 2025
284557c
feat: Improve model loading/unloading status updates
allozaur Nov 26, 2025
9431f35
chore: update webui build output
allozaur Nov 26, 2025
ddf98bd
refactor: Improve API header management via utility functions
allozaur Nov 26, 2025
e40f35f
remove support for extra args
ngxson Nov 26, 2025
e2731c3
set hf_repo/docker_repo as model alias when posible
ngxson Nov 26, 2025
becc602
Merge branch 'master' into xsn/server_model_management_v1_2
ngxson Nov 26, 2025
42483f4
refactor: Remove ConversationsService
allozaur Nov 26, 2025
456828b
refactor: Chat requests abort handling
allozaur Nov 26, 2025
d6ee3d1
refactor: Server store
allozaur Nov 26, 2025
1493ee0
tmp webui build
ngxson Nov 26, 2025
13e7988
refactor: Model modality handling
allozaur Nov 26, 2025
2a5922b
chore: update webui build output
allozaur Nov 26, 2025
6b95118
refactor: Processing state reactivity
allozaur Nov 27, 2025
69065dd
fix: UI
allozaur Nov 27, 2025
6a3d6e7
refactor: Services/Stores syntax + logic improvements
allozaur Nov 27, 2025
78ead49
Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2…
allozaur Nov 27, 2025
d733537
refactor: Architecture cleanup
allozaur Nov 27, 2025
9086bc3
feat: Improve statistic badges
allozaur Nov 27, 2025
db47952
feat: Condition available models based on modality + better model loa…
allozaur Nov 27, 2025
bc57726
docs: Architecture documentation
allozaur Nov 27, 2025
bdaf44a
Merge branch 'master' into xsn/server_model_management_v1_2
ngxson Nov 28, 2025
491fe2d
feat: Update logic for PDF as Image
allozaur Nov 28, 2025
7be833d
add TODO for http client
ngxson Nov 28, 2025
eed1bd9
refactor: Enhance model info and attachment handling
allozaur Nov 28, 2025
3470b12
chore: update webui build output
allozaur Nov 28, 2025
5fadd0f
refactor: Components naming
allozaur Nov 28, 2025
04ef4a0
chore: update webui build output
allozaur Nov 28, 2025
1cf5daa
refactor: Cleanup
allozaur Nov 28, 2025
68b653e
refactor: DRY `getAttachmentDisplayItems` function + fix UI
allozaur Nov 28, 2025
171a092
chore: update webui build output
allozaur Nov 28, 2025
dd30810
fix: Modality detection improvement for text-based PDF attachments
allozaur Nov 28, 2025
1adf173
refactor: Cleanup
allozaur Nov 28, 2025
2f97dbf
docs: Add info comment
allozaur Nov 28, 2025
c76de5e
refactor: Cleanup
allozaur Nov 28, 2025
4d16459
re
allozaur Nov 28, 2025
f50ce7b
refactor: Cleanup
allozaur Nov 28, 2025
d49d97c
refactor: Cleanup
allozaur Nov 28, 2025
648d2de
feat: Attachment logic & UI improvements
allozaur Nov 29, 2025
27b1522
refactor: Constants
allozaur Nov 29, 2025
2464e06
feat: Improve UI sidebar background color
allozaur Nov 29, 2025
ce9c9af
chore: update webui build output
allozaur Nov 29, 2025
493ef08
refactor: Utils imports + move types to `app.d.ts`
allozaur Nov 29, 2025
2d556bb
test: Fix Storybook mocks
allozaur Nov 29, 2025
a568e74
chore: update webui build output
allozaur Nov 29, 2025
33b9cc4
Merge branch 'master' into allozaur/server_model_management_v1_2
allozaur Nov 29, 2025
4f39da8
test: Update Chat Form UI tests
allozaur Nov 29, 2025
949b5fd
refactor: Tooltip Provider from core layout
allozaur Nov 29, 2025
ae8a1e8
refactor: Tests to separate location
allozaur Nov 29, 2025
6fd720e
Merge remote-tracking branch 'origin/allozaur/server_model_management…
allozaur Nov 29, 2025
c1dfccd
Merge branch 'master' into xsn/server_model_management_v1_2
ngxson Nov 29, 2025
a82dbbf
decouple server_models from server_routes
ngxson Nov 29, 2025
360a5ed
test: Move demo test to tests/server
allozaur Nov 29, 2025
acd3c58
refactor: Remove redundant method
allozaur Nov 29, 2025
e8b9d74
chore: update webui build output
allozaur Nov 29, 2025
23cb411
also route anthropic endpoints
ngxson Nov 29, 2025
802e77e
Merge remote-tracking branch 'webui/allozaur/server_model_management_…
ngxson Nov 29, 2025
7b28b5e
fix duplicated arg
ngxson Nov 30, 2025
4a1c05c
fix invalid ptr to shutdown_handler
ngxson Nov 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -613,3 +613,4 @@ $ echo "source ~/.llama-completion.bash" >> ~/.bashrc
- [linenoise.cpp](./tools/run/linenoise.cpp/linenoise.cpp) - C++ library that provides readline-like line editing capabilities, used by `llama-run` - BSD 2-Clause License
- [curl](https://curl.se/) - Client-side URL transfer library, used by various tools/examples - [CURL License](https://curl.se/docs/copyright.html)
- [miniaudio.h](https://github.com/mackron/miniaudio) - Single-header audio format decoder, used by multimodal subsystem - Public domain
- [subprocess.h](https://github.com/sheredom/subprocess.h) - Single-header process launching solution for C and C++ - Public domain
49 changes: 36 additions & 13 deletions common/arg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -212,13 +212,13 @@ struct handle_model_result {
static handle_model_result common_params_handle_model(
struct common_params_model & model,
const std::string & bearer_token,
const std::string & model_path_default,
bool offline) {
handle_model_result result;
// handle pre-fill default model path and url based on hf_repo and hf_file
{
if (!model.docker_repo.empty()) { // Handle Docker URLs by resolving them to local paths
model.path = common_docker_resolve_model(model.docker_repo);
model.name = model.docker_repo; // set name for consistency
} else if (!model.hf_repo.empty()) {
// short-hand to avoid specifying --hf-file -> default it to --model
if (model.hf_file.empty()) {
Expand All @@ -227,7 +227,8 @@ static handle_model_result common_params_handle_model(
if (auto_detected.repo.empty() || auto_detected.ggufFile.empty()) {
exit(1); // built without CURL, error message already printed
}
model.hf_repo = auto_detected.repo;
model.name = model.hf_repo; // repo name with tag
model.hf_repo = auto_detected.repo; // repo name without tag
model.hf_file = auto_detected.ggufFile;
if (!auto_detected.mmprojFile.empty()) {
result.found_mmproj = true;
Expand Down Expand Up @@ -257,8 +258,6 @@ static handle_model_result common_params_handle_model(
model.path = fs_get_cache_file(string_split<std::string>(f, '/').back());
}

} else if (model.path.empty()) {
model.path = model_path_default;
}
}

Expand Down Expand Up @@ -405,7 +404,7 @@ static bool common_params_parse_ex(int argc, char ** argv, common_params_context

// handle model and download
{
auto res = common_params_handle_model(params.model, params.hf_token, DEFAULT_MODEL_PATH, params.offline);
auto res = common_params_handle_model(params.model, params.hf_token, params.offline);
if (params.no_mmproj) {
params.mmproj = {};
} else if (res.found_mmproj && params.mmproj.path.empty() && params.mmproj.url.empty()) {
Expand All @@ -415,12 +414,18 @@ static bool common_params_parse_ex(int argc, char ** argv, common_params_context
// only download mmproj if the current example is using it
for (auto & ex : mmproj_examples) {
if (ctx_arg.ex == ex) {
common_params_handle_model(params.mmproj, params.hf_token, "", params.offline);
common_params_handle_model(params.mmproj, params.hf_token, params.offline);
break;
}
}
common_params_handle_model(params.speculative.model, params.hf_token, "", params.offline);
common_params_handle_model(params.vocoder.model, params.hf_token, "", params.offline);
common_params_handle_model(params.speculative.model, params.hf_token, params.offline);
common_params_handle_model(params.vocoder.model, params.hf_token, params.offline);
}

// model is required (except for server)
// TODO @ngxson : maybe show a list of available models in CLI in this case
if (params.model.path.empty() && ctx_arg.ex != LLAMA_EXAMPLE_SERVER) {
throw std::invalid_argument("error: --model is required\n");
}

if (params.escape) {
Expand Down Expand Up @@ -2090,11 +2095,8 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
add_opt(common_arg(
{"-m", "--model"}, "FNAME",
ex == LLAMA_EXAMPLE_EXPORT_LORA
? std::string("model path from which to load base model")
: string_format(
"model path (default: `models/$filename` with filename from `--hf-file` "
"or `--model-url` if set, otherwise %s)", DEFAULT_MODEL_PATH
),
? "model path from which to load base model"
: "model path to load",
[](common_params & params, const std::string & value) {
params.model.path = value;
}
Expand Down Expand Up @@ -2492,6 +2494,27 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
}
}
).set_examples({LLAMA_EXAMPLE_SERVER}));
add_opt(common_arg(
{"--models-dir"}, "PATH",
"directory containing models for the router server (default: disabled)",
[](common_params & params, const std::string & value) {
params.models_dir = value;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_MODELS_DIR"));
add_opt(common_arg(
{"--models-max"}, "N",
string_format("for router server, maximum number of models to load simultaneously (default: %d, 0 = unlimited)", params.models_max),
[](common_params & params, int value) {
params.models_max = value;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_MODELS_MAX"));
add_opt(common_arg(
{"--no-models-autoload"},
"disables automatic loading of models (default: enabled)",
[](common_params & params) {
params.models_autoload = false;
}
).set_examples({LLAMA_EXAMPLE_SERVER}).set_env("LLAMA_ARG_NO_MODELS_AUTOLOAD"));
add_opt(common_arg(
{"--jinja"},
string_format("use jinja template for chat (default: %s)\n", params.use_jinja ? "enabled" : "disabled"),
Expand Down
14 changes: 11 additions & 3 deletions common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -912,7 +912,7 @@ std::string fs_get_cache_file(const std::string & filename) {
return cache_directory + filename;
}

std::vector<common_file_info> fs_list_files(const std::string & path) {
std::vector<common_file_info> fs_list(const std::string & path, bool include_directories) {
std::vector<common_file_info> files;
if (path.empty()) return files;

Expand All @@ -927,14 +927,22 @@ std::vector<common_file_info> fs_list_files(const std::string & path) {
const auto & p = entry.path();
if (std::filesystem::is_regular_file(p)) {
common_file_info info;
info.path = p.string();
info.name = p.filename().string();
info.path = p.string();
info.name = p.filename().string();
info.is_dir = false;
try {
info.size = static_cast<size_t>(std::filesystem::file_size(p));
} catch (const std::filesystem::filesystem_error &) {
info.size = 0;
}
files.push_back(std::move(info));
} else if (include_directories && std::filesystem::is_directory(p)) {
common_file_info info;
info.path = p.string();
info.name = p.filename().string();
info.size = 0; // Directories have no size
info.is_dir = true;
files.push_back(std::move(info));
}
} catch (const std::filesystem::filesystem_error &) {
// skip entries we cannot inspect
Expand Down
11 changes: 8 additions & 3 deletions common/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@
fprintf(stderr, "%s: built with %s for %s\n", __func__, LLAMA_COMPILER, LLAMA_BUILD_TARGET); \
} while(0)

#define DEFAULT_MODEL_PATH "models/7B/ggml-model-f16.gguf"

struct common_time_meas {
common_time_meas(int64_t & t_acc, bool disable = false);
~common_time_meas();
Expand Down Expand Up @@ -223,6 +221,7 @@ struct common_params_model {
std::string hf_repo = ""; // HF repo // NOLINT
std::string hf_file = ""; // HF file // NOLINT
std::string docker_repo = ""; // Docker repo // NOLINT
std::string name = ""; // in format <user>/<model>[:<tag>] (tag is optional) // NOLINT
};

struct common_params_speculative {
Expand Down Expand Up @@ -478,6 +477,11 @@ struct common_params {
bool endpoint_props = false; // only control POST requests, not GET
bool endpoint_metrics = false;

// router server configs
std::string models_dir = ""; // directory containing models for the router server
int models_max = 4; // maximum number of models to load simultaneously
bool models_autoload = true; // automatically load models when requested via the router server

bool log_json = false;

std::string slot_save_path;
Expand Down Expand Up @@ -641,8 +645,9 @@ struct common_file_info {
std::string path;
std::string name;
size_t size = 0; // in bytes
bool is_dir = false;
};
std::vector<common_file_info> fs_list_files(const std::string & path);
std::vector<common_file_info> fs_list(const std::string & path, bool include_directories);

//
// Model utils
Expand Down
2 changes: 1 addition & 1 deletion common/download.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1047,7 +1047,7 @@ std::string common_docker_resolve_model(const std::string &) {
std::vector<common_cached_model_info> common_list_cached_models() {
std::vector<common_cached_model_info> models;
const std::string cache_dir = fs_get_cache_directory();
const std::vector<common_file_info> files = fs_list_files(cache_dir);
const std::vector<common_file_info> files = fs_list(cache_dir, false);
for (const auto & file : files) {
if (string_starts_with(file.name, "manifest=") && string_ends_with(file.name, ".json")) {
common_cached_model_info model_info;
Expand Down
4 changes: 3 additions & 1 deletion common/download.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@ struct common_cached_model_info {
std::string model;
std::string tag;
size_t size = 0; // GGUF size in bytes
// return string representation like "user/model:tag"
// if tag is "latest", it will be omitted
std::string to_string() const {
return user + "/" + model + ":" + tag;
return user + "/" + model + (tag == "latest" ? "" : ":" + tag);
}
};

Expand Down
2 changes: 2 additions & 0 deletions scripts/sync_vendor.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
"https://github.com/mackron/miniaudio/raw/669ed3e844524fcd883231b13095baee9f6de304/miniaudio.h": "vendor/miniaudio/miniaudio.h",

"https://raw.githubusercontent.com/yhirose/cpp-httplib/refs/tags/v0.28.0/httplib.h": "vendor/cpp-httplib/httplib.h",

"https://raw.githubusercontent.com/sheredom/subprocess.h/b49c56e9fe214488493021017bf3954b91c7c1f5/subprocess.h": "vendor/sheredom/subprocess.h",
}

for url, filename in vendor.items():
Expand Down
2 changes: 1 addition & 1 deletion tests/test-quantize-stats.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
#endif

struct quantize_stats_params {
std::string model = DEFAULT_MODEL_PATH;
std::string model = "models/7B/ggml-model-f16.gguf";
bool verbose = false;
bool per_layer_stats = false;
bool print_histogram = false;
Expand Down
2 changes: 2 additions & 0 deletions tools/server/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ set(TARGET_SRCS
server.cpp
server-http.cpp
server-http.h
server-models.cpp
server-models.h
server-task.cpp
server-task.h
server-queue.cpp
Expand Down
Loading
Loading