Skip to content

Feature Request: Support for LlamaBidirectionalModel architecture #17478

@hdnh2006

Description

@hdnh2006

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Hello! recently NVIDIA released an embedding model that looks amazing.

I have tried to convert it into gguf using hte classic convert_hf_to_gguf.py and I am getting this message:

python3 convert_hf_to_gguf.py models/nvidia/llama-embed-nemotron-8b/ --outfile llama-embed-f16.gguf
INFO:hf-to-gguf:Loading model: llama-embed-nemotron-8b
INFO:hf-to-gguf:Model architecture: LlamaBidirectionalModel
ERROR:hf-to-gguf:Model LlamaBidirectionalModel is not supported

Is there any plan to support this architecture in the next weeks?

Thanks in advance.

Motivation

This model seems to be the best FOSS model so it would be fantastic to be supported by llama.cpp

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedNeeds help from the communitymodelModel specific

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions