-
Notifications
You must be signed in to change notification settings - Fork 14k
[model] Add support for Plamo3 #17304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…lama.cpp into features/suppert-plamo-3
|
Any non-gated models available? |
|
There are no non-gated models available at the moment. |
|
Sorry, the checks failed, so I’m reverting it to draft for now. |
The |
|
I’ve reopened this PR. Thank you in advance. |
|
When you have time, I’d appreciate a quick look or any feedback on this PR. |
This PR adds support for the PLaMo-3 series (2B, 8B, 31B base models):
PLaMo-3 uses a hybrid architecture with Sliding Window Attention (SWA) and standard full attention layers, as well as a custom FFN layout. This PR wires those pieces into llama.cpp so that the official checkpoints can be converted to GGUF and run with the usual backends.