|
5 | 5 | "id": "9ca87bfb", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
| 8 | + "# 🛍️ | Cora-For-Zava: Your First Evaluation Flow\n", |
8 | 9 | "\n", |
| 10 | + "Welcome! This notebook sets up the Azure AI Evaluation SDK and walks you through your first evaluation with quality and safety evaluators.\n", |
9 | 11 | "\n", |
10 | | - "# 🔍 | Lab 01: Run Your First Evaluation With The SDK \n", |
| 12 | + "## 🛒 Our Zava Scenario\n", |
11 | 13 | "\n", |
12 | | - "This notebook sets up the Azure AI Evaluation SDK and walks you through the first _evaluate()_ call with quality and safety evaluators. Use this to get a sense for how evaluations work, and what built-in evaluators are provided to you. **Bonus** - We'll see how the Azure AI Foundry portal renders results\n", |
| 14 | + "**Cora** is a customer service chatbot for **Zava** - a fictitious retailer of home improvement goods for DIY enthusiasts. Before deploying Cora to help customers, you need to ensure it provides accurate, safe, and helpful responses. Evaluation is the foundation of trust in AI applications, making it a critical part of the Generative AI Ops (GenAIOps) lifecycle. Without rigorous evaluation, Cora could produce content that is fabricated, irrelevant, harmful, or vulnerable to adversarial attacks.\n", |
13 | 15 | "\n", |
| 16 | + "## 🎯 What You'll Build\n", |
14 | 17 | "\n", |
15 | | - "Evaluation is the foundation of trust in AI applications, making it a critical part of the Generative AI Ops (GenAIOps) lifecycle. Without rigorous evaluation at each step, the AI solution can produce content that is fabricated (ungrounded in reality), irrelevant, harmful - or vulnerable to adversarial attacks. \n", |
| 18 | + "By the end of this notebook, you'll have:\n", |
| 19 | + "- ✅ Run your first evaluation using the Azure AI Evaluation SDK\n", |
| 20 | + "- ✅ Configured and used built-in evaluators for quality and safety\n", |
| 21 | + "- ✅ Evaluated a test dataset with sample responses\n", |
| 22 | + "- ✅ Saved evaluation results to a file\n", |
| 23 | + "- ✅ Viewed evaluation results in Azure AI Foundry portal\n", |
16 | 24 | "\n", |
17 | | - "The three stages of GenAIOps Evaluation can be represented by:\n", |
| 25 | + "## 💡 What You'll Learn\n", |
18 | 26 | "\n", |
19 | | - "1. **Base Model Selection** - Before building your application, you need to select the right base model for your use case. Use evaluators to compare base models for fit using criteria like accuracy, quality, safety and task performance.\n", |
20 | | - "1. **Pre-Production Evaluation** - Once you have selected a base model, you need to customize it to build the AI application (e.g., RAG with data, agentic AI etc.). This pre-production phase is where you iterate rapidly on the prototype, using evaluations to assess robustness, validate edge cases, measure key metrics, and simulate real-world interactins for testing coverage.\n", |
21 | | - "1. **Post-Production Monitoring** - Helps ensure the AI application maintains desired quality, safety and performance goals in real-world environments - with capabilities that include performance tracking and fast incident response.\n", |
| 27 | + "- What the `evaluate()` function does and how to use it\n", |
| 28 | + "- How to configure and run evaluations with built-in evaluators\n", |
| 29 | + "- How to interpret evaluation metrics\n", |
| 30 | + "- How to view results in the Azure AI Foundry portal\n", |
22 | 31 | "\n", |
23 | | - "This is where **evaluators** become critical. Evaluators are specialized tool that help you assess the quality, safety and reliability of your AI application responses. The Azure AI Foundry platform offers a comprehensive suite of built-in evaluators that cover a broad category of use cases including: Retrieval Augmented Generation (RAG), agentic AI, safety & security, and textual similarity - along with general purpose evaluators.\n" |
| 32 | + "## 📊 The Three Stages of GenAIOps Evaluation\n", |
| 33 | + "\n", |
| 34 | + "1. **Base Model Selection** - Compare models for accuracy, quality, safety and task performance\n", |
| 35 | + "2. **Pre-Production Evaluation** - Iterate on prototypes, assess robustness, validate edge cases\n", |
| 36 | + "3. **Post-Production Monitoring** - Track performance and ensure quality in real-world environments\n", |
| 37 | + "\n", |
| 38 | + "> **Note**: This notebook focuses on pre-production evaluation using a small test dataset.\n", |
| 39 | + "\n", |
| 40 | + "Ready to run your first evaluation? Let's get started! 🚀\n" |
24 | 41 | ] |
25 | 42 | }, |
26 | 43 | { |
|
0 commit comments