imran-binhasan
diff --git a/‎examples/partners/eval_driven_system_design/receipt_inspection.ipynb‎
Lines changed: 34 additions & 0 deletions b/‎examples/partners/eval_driven_system_design/receipt_inspection.ipynb‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎images/partner_development_flywheel.png‎
-126 KB b/‎images/partner_development_flywheel.png‎
-126 KB
diff --git a/‎images/partner_model_improvement_waterfall.png‎
-317 KB b/‎images/partner_model_improvement_waterfall.png‎
-317 KB
diff --git a/‎images/partner_process_flowchart.png‎
7.02 KB b/‎images/partner_process_flowchart.png‎
7.02 KB
diff --git a/‎images/partner_project_lifecycle.png‎
161 KB b/‎images/partner_project_lifecycle.png‎
161 KB
diff --git a/‎registry.yaml‎
Lines changed: 5 additions & 0 deletions b/‎registry.yaml‎
Lines changed: 5 additions & 0 deletions
@@ -112,6 +112,15 @@
    "source": [
     "## Project Lifecycle\n",
     "\n",
+    "Not every project will proceed in the same way, but projects generally have some \n",
+    "important components in common.\n",
+    "\n",
+    "![Project Lifecycle](../../../images/partner_project_lifecycle.png)\n",
+    "\n",
+    "The solid arrows show the primary progressions or steps, while the dotted line \n",
+    "represents the ongoing nature of problem understanding - uncovering more about\n",
+    "the customer domain will influence every step of the process. We wil examine \n",
+    "several of these iterative cycles of refinement in detail below. \n",
     "Not every project will proceed in the same way, but projects generally have some common\n",
     "important components.\n",
     "\n",
@@ -133,6 +142,11 @@
     "It's very rare that a real-world project will start with all the data necessary to get\n",
     "to a satisfactory solution, much less to establish confidence.\n",
     "\n",
+    "In our case, we're going to assume that we have a decent sample of system *inputs*, \n",
+    "in the form of but receipt images, but start without any fully annotated data. We find \n",
+    "this is a not-unusual situation when automating an existing process. Instead, \n",
+    "we'll walk through the process of building that out as we go along by collaborating with\n",
+    "domain experts, and make our evals progressively more comprehensive.\n",
     "In our case, we're going to assume that we have a decent sample of system *inputs*\n",
     "(here, photographs of receipts), but start without any fully annotated data. We'll walk\n",
     "through the process of incrementally expanding our test and training sets as we go along\n",
@@ -498,6 +512,21 @@
     "### Action Decision\n",
     "\n",
     "Next, we need to close the loop and get to an actual decision based on receipts. This\n",
+    "looks pretty similar, so we'll present the code without comment.\n",
+    "\n",
+    "Ordinarily one would start with the most capable model - `o3`, at this time - for a \n",
+    "first pass, and then once correctness is established experiment with different models\n",
+    "to analyze any tradeoffs for their business impact, and potentially consider whether \n",
+    "they are remediable with iteration. A client may be willing to take a certain accuracy \n",
+    "hit for lower latency or cost, or it may be more effective to change the architecture\n",
+    "to hit cost, latency, and accuracy goals. We'll get into how to make these tradeoffs\n",
+    "explicitly and objectively later on. \n",
+    "\n",
+    "For this cookbook, `o3` might be too good. We'll use `o4-mini` for our first pass, so \n",
+    "that we get a few reasoning errors we can use to illustrate the means of addressing\n",
+    "them when they occur.\n",
+    "\n",
+    "Next, we need to close the loop and get to an actual decision based on receipts. This\n",
     "looks pretty similar, so we'll present the code without comment."
    ]
   },
@@ -887,6 +916,10 @@
    "metadata": {},
    "source": [
     "After you run that eval you'll be able to view it in the UI, and should see something\n",
+    "like the below. \n",
+    "\n",
+    "(Note, if you have a Zero-Data-Retention agreement, this data is not stored\n",
+    "by OpenAI, so will not be available in this interface.)\n",
     "like:\n",
     "\n",
     "![Summary UI](../../../images/partner_summary_ui.png)\n",
@@ -1617,6 +1650,7 @@
     "ARE NOT TRAVEL-RELATED, THEN IT MUST BE AUDITED.\n",
     "```\n",
     "\n",
+    "4. We added three examples, JSON input/output pairs wrapped in XML tags.\n",
     "3. We added three examples, JSON input/output pairs wrapped in XML tags.\n",
     "\n",
     "With our prompt revisions, we'll regenerate the data to evaluate and re-run the same\n",
 
@@ -9,8 +9,13 @@
   date: 2025-06-01
   authors:
     - shikhar-cyber
+    - moredatarequired
+    - tooluser
+    - eddiesiegel
   tags:
     - evals
+    - API Flywheel
+    - completions
     - responses
     - functions
     - tracing