blav
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 25 additions & 0 deletions b/‎README.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎examples/README.md‎
Lines changed: 29 additions & 0 deletions b/‎examples/README.md‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎examples/autogen_basic.py‎
Lines changed: 59 additions & 0 deletions b/‎examples/autogen_basic.py‎
Lines changed: 59 additions & 0 deletions
diff --git a/‎examples/autogen_functions.py‎
Lines changed: 110 additions & 0 deletions b/‎examples/autogen_functions.py‎
Lines changed: 110 additions & 0 deletions
diff --git a/‎examples/basic.py‎
Lines changed: 50 additions & 0 deletions b/‎examples/basic.py‎
Lines changed: 50 additions & 0 deletions
@@ -1 +1,3 @@
 */__pycache__
+**/.DS_Store
+.cache
@@ -0,0 +1,25 @@
+# Llama_CPP OpenAI API Server Project Overview
+
+## Introduction
+The `llama_cpp_openai` module provides a lightweight implementation of an OpenAI API server on top of
+Llama CPP models. This implementation is particularly designed for use with Microsoft AutoGen and includes support for function calls. The project is structured around the `llama_cpp_python` module and is aimed at facilitating the integration of AI models in applications using OpenAI clients or API.
+
+## Project Structure
+The project is organized into several key directories and files:
+
+- **llama_cpp_openai**: Contains the core implementation of the API server.
+    - `__init__.py`: Initialization file for the module.
+    - `_api_server.py`: Defines the OpenAPI server, using FastAPI for handling requests.
+    - `_llama_cpp_functions_chat_handler.py`: Implements the `llama-2-functionary` chat handler that supports function calling.
+
+- **examples**: Provides example scripts demonstrating the usage of the API server.
+    - `README.md`: Overview and description of example scripts.
+    - `autogen_basic.py`: Basic integration of AutoGen with Llama_CPP using the OpenAI API server.
+    - `autogen_functions.py`: Sets up an AutoGen chatbot with function calls capabilities.
+    - `basic.py`: Demonstrates the setup and start of an API server using the Llama library.
+
+## Key Features
+- **FastAPI Integration**: Utilizes FastAPI for efficient and easy-to-use API endpoints.
+- **Llama Library Usage**: Leverages the Llama library for handling AI model interactions.
+- **Function Call Support**: Includes capabilities for function calls in chatbot environments.
+- **Examples for Quick Start**: Provides example scripts for easy understanding and implementation.
@@ -0,0 +1,29 @@
+# AutoGen with Llama_CPP and OpenAI API
+
+## Overview
+This module demonstrates the integration of Microsoft AutoGen with the Llama_CPP library and OpenAI API server. It includes scripts for setting up an AutoGen chatbot environment and starting a local OpenAI API server using the Llama library, which interfaces with OpenAI-like models.
+
+## Files Description
+1. **basic.py**: 
+   - A simple script to set up and start an OpenAPI API server on top of Llama_CPP.
+   - Mimics the OpenAI API format for compatibility with existing OpenAPI clients.
+   - Requires specifying a model and configuration for the `Llama` class.
+
+2. **autogen_basic.py**: 
+   - Demonstrates the basic integration of Microsoft AutoGen with Llama_CPP using the OpenAI API server.
+   - Initialises a Llama instance with a GGUF model and starts a local OpenAI API server.
+   - Runs a basic AutoGen agent
+
+3. **autogen_functions.py**: 
+   - Sets up an AutoGen chatbot environment with function call capabilities.
+   - Utilizes a local OpenAI API server on top of Llama_CPP.
+   - Ideal for models supporting function calls, such as Trelis/Mistral-7B-Instruct-v0.1.
+   - Includes initialization of a Llama instance with a specific GPT-based model.
+
+## Setup and Usage
+1. In the project folder, run `poetry install`
+2. Run the desired script:
+   - For setting up a simple API server: `poetry run python basic.py`
+   - For basic integration with AutoGen: `poetry run python autogen_basic.py`
+   - For a chatbot environment with function calls: `poetry run python autogen_functions.py`
+
@@ -0,0 +1,59 @@
+from llama_cpp_openai._api_server import start_openai_api_server
+from llama_cpp import Llama
+from autogen.agentchat import AssistantAgent, UserProxyAgent
+
+
+"""
+This script demonstrates the integration of Microsoft autogen with Llama_CPP using the OpenAI API server.
+
+The script performs the following steps:
+1. It initializes a Llama instance with the given path to the GPT-based (GGUF) model, model chat format, and embedding capabilities.
+2. It starts a local OpenAI API server using the Llama instance, hosted on localhost at port 8000.
+3. It sets up AutoGen configuration to use the local API server for the chatbot.
+4. Two types of agents are created:
+   - UserProxyAgent: Represents a human user in the chat, with a system message indicating a human participant.
+   - AssistantAgent: A general-purpose chatbot configured with the local Llama model.
+5. The UserProxyAgent initiates a chat with the AssistantAgent, sending an initial message to write a poem about poultry.
+
+This script is an example of how to integrate Llama_CPP with a local OpenAI server and an autogenerated chatbot interface.
+"""
+
+llm = Llama(
+    # path to gguf model
+    "/path/to/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
+
+    # model chat format (see Llama docs for more info)
+    chat_format="mistrallite", 
+)
+
+start_openai_api_server(
+    llm=llm,
+    host="localhost", 
+    port=8000,
+)
+
+# autogen configuration to use the local API server
+llm_config={
+    "config_list": [{
+        "base_url": "http://localhost:8000/v1",
+        "model": "dontcare",
+        "api_key": "dontcare",
+    }],
+}
+
+user_proxy = UserProxyAgent(
+    name="Human",
+    system_message="A human.",
+    human_input_mode="ALWAYS",
+)
+
+assistant = AssistantAgent(
+    name="chatbot",
+    system_message="General purpose life helping chatbot.",
+    llm_config=llm_config,
+)
+
+user_proxy.initiate_chat(
+    assistant, 
+    message="Write a poem about poultry."
+)
@@ -0,0 +1,110 @@
+from llama_cpp_openai._api_server import start_openai_api_server
+from llama_cpp import Llama
+from autogen.agentchat import AssistantAgent, UserProxyAgent
+
+"""
+This script sets up an AutoGen chatbot environment with function calls capabilities, using a local OpenAI API server on top of Llama_CPP and a model supporting function calls.
+
+Process:
+1. Initialize a Llama instance with a specific GPT-based model (e.g., Trelis/Mistral-7B-Instruct-v0.1 with function-calling capabilities), setting the model chat format to 'llama2_functionary' and enabling embeddings.
+2. Start the local OpenAI API server on localhost at port 8000 using the Llama instance.
+3. Configure the AutoGen setup to use the local API server, including custom function tools for 'weather' and 'traffic' information retrieval.
+4. Create a UserProxyAgent representing a human user in the chat, with a system message indicating a human participant.
+5. Create an AssistantAgent as a general-purpose chatbot, configured with the local Llama model and the ability to call custom functions.
+6. The UserProxyAgent registers custom lambda functions for 'weather' and 'traffic' to simulate responses based on location and/or date.
+7. Initiate a chat between the UserProxyAgent and the AssistantAgent with an initial message querying the weather in Tokyo.
+
+This script exemplifies the integration of function-calling capabilities in a chatbot environment, showcasing how custom functionalities can be embedded within an AutoGen agent setup using a local OpenAI API server.
+"""
+
+llm = Llama(
+    # path to a gguf model supporting function calls (eg. HuggingFace's Trelis/Mistral-7B-Instruct-v0.1-function-calling-v2)
+    "/Users/blav/.cache/lm-studio/models/Trelis/Mistral-7B-Instruct-v0.1-function-calling-v2/Mistral-7B-Instruct-v0.1-function-calling-v2.gguf",
+
+    # model chat format (see Llama docs for more info)
+    chat_format="llama-2-functionary", 
+)
+
+start_openai_api_server(
+    llm=llm,
+    host="localhost", 
+    port=8000,
+)
+
+# autogen configuration to use the local API server
+llm_config={
+    "cache_seed": None,
+    "config_list": [{
+        "base_url": "http://localhost:8000/v1",
+        "model": "dontcare",
+        "api_key": "dontcare",
+    }],
+    "tools": [{
+        "type": "function",
+        "function": {
+            "name": "weather",
+            "description": "Get weather information for a location.",
+            "parameters": {
+                "type": "object",
+                "title": "weather",
+                "properties": {
+                    "location": {
+                        "title": "location",
+                        "type": "string"
+                    },
+                },
+                "required": [ 
+                    "location", 
+                ]
+            }
+        }
+    }, {
+        "type": "function",
+        "function": {
+            "name": "traffic",
+            "description": "Get traffic information for a location and date.",
+            "parameters": {
+                "type": "object",
+                "title": "traffic",
+                "properties": {
+                    "location": {
+                        "title": "location",
+                        "type": "string"
+                    },
+                    "date": {
+                        "title": "date",
+                        "type": "string"
+                    },
+                },
+                "required": [ 
+                    "location", 
+                    "date",
+                ]
+            }
+        }
+    }],
+}
+
+user_proxy = UserProxyAgent(
+    name="Human",
+    system_message="A human.",
+    human_input_mode="ALWAYS",
+)
+
+assistant = AssistantAgent(
+    name="chatbot",
+    system_message="General purpose life helping chatbot.",
+    llm_config=llm_config,
+)
+
+user_proxy.register_function(
+    function_map={
+        "weather": lambda location: f"weather is nice".format(location=location),
+        "traffic": lambda location, date: f"busy".format(location=location, date=date)
+    }
+)
+
+user_proxy.initiate_chat(
+    assistant,
+    message="how's the weather in Tokyo today?",
+)
@@ -0,0 +1,50 @@
+from llama_cpp_openai._api_server import start_openai_api_server
+from llama_cpp import Llama
+
+"""
+This script demonstrates how to set up and start an API server using the Llama library,
+which interfaces with an OpenAI-like model for generating text completions and embeddings.
+The server mimics the OpenAI API format, allowing for easy integration with systems
+already using OpenAI's API endpoints.
+
+First, it initializes a Llama instance with a specified model and configuration.
+The `Llama` class requires the path to the GGUF model, the chat format, and a flag 
+indicating the use of embeddings.
+
+The `start_openai_api_server` function then launches an API server on the localhost
+at the specified port. This server provides endpoints for text completions and embeddings,
+similar to the OpenAI API.
+
+The server runs on a separate thread, allowing the main program to continue running
+or perform other tasks. The script concludes by joining the thread, which ensures
+that the script keeps running as long as the server is active.
+
+Endpoints:
+- Text Completions: http://localhost:8000/v1/chat/completions
+- Embeddings: http://localhost:8000/v1/embeddings
+
+These endpoints can be used in the same manner as the corresponding OpenAI API endpoints.
+"""
+
+llm = Llama(
+    # path to gguf model
+    "/Users/blav/.cache/lm-studio/models/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
+
+    # model chat format (see Llama docs for more info)
+    chat_format="mistrallite", 
+
+    # needed by embeddings endpoint
+    embedding=True,
+)
+
+thread, _ = start_openai_api_server(
+    llm=llm,
+    host="localhost", 
+    port=8000,
+)
+
+thread.join()
+
+# Now you can send requests to the API server at 
+# http://localhost:8000/v1/chat/completions and http://localhost:8000/v1/embeddings
+# using the same format as the OpenAI API.
Original file line number	Diff line number	Diff line change
`@@ -1 +1,3 @@`
`1`	`1`	`*/__pycache__`
	`2`	`+**/.DS_Store`
	`3`	`+.cache`