Skip to content

Commit 4c3a186

Browse files
committed
add dod
1 parent baa8d26 commit 4c3a186

File tree

12 files changed

+1237
-297
lines changed

12 files changed

+1237
-297
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
11
*/__pycache__
2+
**/.DS_Store
3+
.cache

README.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Llama_CPP OpenAI API Server Project Overview
2+
3+
## Introduction
4+
The `llama_cpp_openai` module provides a lightweight implementation of an OpenAI API server on top of
5+
Llama CPP models. This implementation is particularly designed for use with Microsoft AutoGen and includes support for function calls. The project is structured around the `llama_cpp_python` module and is aimed at facilitating the integration of AI models in applications using OpenAI clients or API.
6+
7+
## Project Structure
8+
The project is organized into several key directories and files:
9+
10+
- **llama_cpp_openai**: Contains the core implementation of the API server.
11+
- `__init__.py`: Initialization file for the module.
12+
- `_api_server.py`: Defines the OpenAPI server, using FastAPI for handling requests.
13+
- `_llama_cpp_functions_chat_handler.py`: Implements the `llama-2-functionary` chat handler that supports function calling.
14+
15+
- **examples**: Provides example scripts demonstrating the usage of the API server.
16+
- `README.md`: Overview and description of example scripts.
17+
- `autogen_basic.py`: Basic integration of AutoGen with Llama_CPP using the OpenAI API server.
18+
- `autogen_functions.py`: Sets up an AutoGen chatbot with function calls capabilities.
19+
- `basic.py`: Demonstrates the setup and start of an API server using the Llama library.
20+
21+
## Key Features
22+
- **FastAPI Integration**: Utilizes FastAPI for efficient and easy-to-use API endpoints.
23+
- **Llama Library Usage**: Leverages the Llama library for handling AI model interactions.
24+
- **Function Call Support**: Includes capabilities for function calls in chatbot environments.
25+
- **Examples for Quick Start**: Provides example scripts for easy understanding and implementation.

examples/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# AutoGen with Llama_CPP and OpenAI API
2+
3+
## Overview
4+
This module demonstrates the integration of Microsoft AutoGen with the Llama_CPP library and OpenAI API server. It includes scripts for setting up an AutoGen chatbot environment and starting a local OpenAI API server using the Llama library, which interfaces with OpenAI-like models.
5+
6+
## Files Description
7+
1. **basic.py**:
8+
- A simple script to set up and start an OpenAPI API server on top of Llama_CPP.
9+
- Mimics the OpenAI API format for compatibility with existing OpenAPI clients.
10+
- Requires specifying a model and configuration for the `Llama` class.
11+
12+
2. **autogen_basic.py**:
13+
- Demonstrates the basic integration of Microsoft AutoGen with Llama_CPP using the OpenAI API server.
14+
- Initialises a Llama instance with a GGUF model and starts a local OpenAI API server.
15+
- Runs a basic AutoGen agent
16+
17+
3. **autogen_functions.py**:
18+
- Sets up an AutoGen chatbot environment with function call capabilities.
19+
- Utilizes a local OpenAI API server on top of Llama_CPP.
20+
- Ideal for models supporting function calls, such as Trelis/Mistral-7B-Instruct-v0.1.
21+
- Includes initialization of a Llama instance with a specific GPT-based model.
22+
23+
## Setup and Usage
24+
1. In the project folder, run `poetry install`
25+
2. Run the desired script:
26+
- For setting up a simple API server: `poetry run python basic.py`
27+
- For basic integration with AutoGen: `poetry run python autogen_basic.py`
28+
- For a chatbot environment with function calls: `poetry run python autogen_functions.py`
29+

examples/autogen_basic.py

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
from llama_cpp_openai._api_server import start_openai_api_server
2+
from llama_cpp import Llama
3+
from autogen.agentchat import AssistantAgent, UserProxyAgent
4+
5+
6+
"""
7+
This script demonstrates the integration of Microsoft autogen with Llama_CPP using the OpenAI API server.
8+
9+
The script performs the following steps:
10+
1. It initializes a Llama instance with the given path to the GPT-based (GGUF) model, model chat format, and embedding capabilities.
11+
2. It starts a local OpenAI API server using the Llama instance, hosted on localhost at port 8000.
12+
3. It sets up AutoGen configuration to use the local API server for the chatbot.
13+
4. Two types of agents are created:
14+
- UserProxyAgent: Represents a human user in the chat, with a system message indicating a human participant.
15+
- AssistantAgent: A general-purpose chatbot configured with the local Llama model.
16+
5. The UserProxyAgent initiates a chat with the AssistantAgent, sending an initial message to write a poem about poultry.
17+
18+
This script is an example of how to integrate Llama_CPP with a local OpenAI server and an autogenerated chatbot interface.
19+
"""
20+
21+
llm = Llama(
22+
# path to gguf model
23+
"/path/to/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
24+
25+
# model chat format (see Llama docs for more info)
26+
chat_format="mistrallite",
27+
)
28+
29+
start_openai_api_server(
30+
llm=llm,
31+
host="localhost",
32+
port=8000,
33+
)
34+
35+
# autogen configuration to use the local API server
36+
llm_config={
37+
"config_list": [{
38+
"base_url": "http://localhost:8000/v1",
39+
"model": "dontcare",
40+
"api_key": "dontcare",
41+
}],
42+
}
43+
44+
user_proxy = UserProxyAgent(
45+
name="Human",
46+
system_message="A human.",
47+
human_input_mode="ALWAYS",
48+
)
49+
50+
assistant = AssistantAgent(
51+
name="chatbot",
52+
system_message="General purpose life helping chatbot.",
53+
llm_config=llm_config,
54+
)
55+
56+
user_proxy.initiate_chat(
57+
assistant,
58+
message="Write a poem about poultry."
59+
)

examples/autogen_functions.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
from llama_cpp_openai._api_server import start_openai_api_server
2+
from llama_cpp import Llama
3+
from autogen.agentchat import AssistantAgent, UserProxyAgent
4+
5+
"""
6+
This script sets up an AutoGen chatbot environment with function calls capabilities, using a local OpenAI API server on top of Llama_CPP and a model supporting function calls.
7+
8+
Process:
9+
1. Initialize a Llama instance with a specific GPT-based model (e.g., Trelis/Mistral-7B-Instruct-v0.1 with function-calling capabilities), setting the model chat format to 'llama2_functionary' and enabling embeddings.
10+
2. Start the local OpenAI API server on localhost at port 8000 using the Llama instance.
11+
3. Configure the AutoGen setup to use the local API server, including custom function tools for 'weather' and 'traffic' information retrieval.
12+
4. Create a UserProxyAgent representing a human user in the chat, with a system message indicating a human participant.
13+
5. Create an AssistantAgent as a general-purpose chatbot, configured with the local Llama model and the ability to call custom functions.
14+
6. The UserProxyAgent registers custom lambda functions for 'weather' and 'traffic' to simulate responses based on location and/or date.
15+
7. Initiate a chat between the UserProxyAgent and the AssistantAgent with an initial message querying the weather in Tokyo.
16+
17+
This script exemplifies the integration of function-calling capabilities in a chatbot environment, showcasing how custom functionalities can be embedded within an AutoGen agent setup using a local OpenAI API server.
18+
"""
19+
20+
llm = Llama(
21+
# path to a gguf model supporting function calls (eg. HuggingFace's Trelis/Mistral-7B-Instruct-v0.1-function-calling-v2)
22+
"/Users/blav/.cache/lm-studio/models/Trelis/Mistral-7B-Instruct-v0.1-function-calling-v2/Mistral-7B-Instruct-v0.1-function-calling-v2.gguf",
23+
24+
# model chat format (see Llama docs for more info)
25+
chat_format="llama-2-functionary",
26+
)
27+
28+
start_openai_api_server(
29+
llm=llm,
30+
host="localhost",
31+
port=8000,
32+
)
33+
34+
# autogen configuration to use the local API server
35+
llm_config={
36+
"cache_seed": None,
37+
"config_list": [{
38+
"base_url": "http://localhost:8000/v1",
39+
"model": "dontcare",
40+
"api_key": "dontcare",
41+
}],
42+
"tools": [{
43+
"type": "function",
44+
"function": {
45+
"name": "weather",
46+
"description": "Get weather information for a location.",
47+
"parameters": {
48+
"type": "object",
49+
"title": "weather",
50+
"properties": {
51+
"location": {
52+
"title": "location",
53+
"type": "string"
54+
},
55+
},
56+
"required": [
57+
"location",
58+
]
59+
}
60+
}
61+
}, {
62+
"type": "function",
63+
"function": {
64+
"name": "traffic",
65+
"description": "Get traffic information for a location and date.",
66+
"parameters": {
67+
"type": "object",
68+
"title": "traffic",
69+
"properties": {
70+
"location": {
71+
"title": "location",
72+
"type": "string"
73+
},
74+
"date": {
75+
"title": "date",
76+
"type": "string"
77+
},
78+
},
79+
"required": [
80+
"location",
81+
"date",
82+
]
83+
}
84+
}
85+
}],
86+
}
87+
88+
user_proxy = UserProxyAgent(
89+
name="Human",
90+
system_message="A human.",
91+
human_input_mode="ALWAYS",
92+
)
93+
94+
assistant = AssistantAgent(
95+
name="chatbot",
96+
system_message="General purpose life helping chatbot.",
97+
llm_config=llm_config,
98+
)
99+
100+
user_proxy.register_function(
101+
function_map={
102+
"weather": lambda location: f"weather is nice".format(location=location),
103+
"traffic": lambda location, date: f"busy".format(location=location, date=date)
104+
}
105+
)
106+
107+
user_proxy.initiate_chat(
108+
assistant,
109+
message="how's the weather in Tokyo today?",
110+
)

examples/basic.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
from llama_cpp_openai._api_server import start_openai_api_server
2+
from llama_cpp import Llama
3+
4+
"""
5+
This script demonstrates how to set up and start an API server using the Llama library,
6+
which interfaces with an OpenAI-like model for generating text completions and embeddings.
7+
The server mimics the OpenAI API format, allowing for easy integration with systems
8+
already using OpenAI's API endpoints.
9+
10+
First, it initializes a Llama instance with a specified model and configuration.
11+
The `Llama` class requires the path to the GGUF model, the chat format, and a flag
12+
indicating the use of embeddings.
13+
14+
The `start_openai_api_server` function then launches an API server on the localhost
15+
at the specified port. This server provides endpoints for text completions and embeddings,
16+
similar to the OpenAI API.
17+
18+
The server runs on a separate thread, allowing the main program to continue running
19+
or perform other tasks. The script concludes by joining the thread, which ensures
20+
that the script keeps running as long as the server is active.
21+
22+
Endpoints:
23+
- Text Completions: http://localhost:8000/v1/chat/completions
24+
- Embeddings: http://localhost:8000/v1/embeddings
25+
26+
These endpoints can be used in the same manner as the corresponding OpenAI API endpoints.
27+
"""
28+
29+
llm = Llama(
30+
# path to gguf model
31+
"/Users/blav/.cache/lm-studio/models/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
32+
33+
# model chat format (see Llama docs for more info)
34+
chat_format="mistrallite",
35+
36+
# needed by embeddings endpoint
37+
embedding=True,
38+
)
39+
40+
thread, _ = start_openai_api_server(
41+
llm=llm,
42+
host="localhost",
43+
port=8000,
44+
)
45+
46+
thread.join()
47+
48+
# Now you can send requests to the API server at
49+
# http://localhost:8000/v1/chat/completions and http://localhost:8000/v1/embeddings
50+
# using the same format as the OpenAI API.

0 commit comments

Comments
 (0)