Moved fine-tuning to the end

frgud · frgud · commit fcd8e442d94f · 2025-10-07T13:43:31.000-04:00
diff --git a/workshops/fine-tuning-with-sagemakerai-and-bedrock/task_05_fmops/05.00_fmops_examples.ipynb b/workshops/fine-tuning-with-sagemakerai-and-bedrock/task_05_fmops/05.00_fmops_examples.ipynb
@@ -187,96 +187,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 4. Templating a Prompt"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In this workshop we are going to fine-tune DeepSeek-R1-Distill-Llama-8B to become a medical expert. To accomplish this, we will execute a fine-tuning job using Managed MLflow on SageMaker AI. We get our data from the [FreedomIntelligence/medical-o1-reasoning-SFT](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) dataset, available on HuggingFace.\n",
-    "\n",
-    "We perform the full fine-tuning step in the next lab. In this lab, we show a small example of what fine-tuning looks like for a single record of the dataset."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "FINE_TUNING_DATA_SAMPLE = {\n",
-    "    \"Question\": \"A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\", \n",
-    "    \"Complex_CoT\": \"Okay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem.\\n\\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\\n\\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal.\\n\\nAll in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.\",\n",
-    "    \"Response\": \"Cystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.\"\n",
-    "}\n",
-    "\n",
-    "\n",
-    "PROMPT_TEMPLATE = \"\"\"\n",
-    "<|begin_of_text|>\n",
-    "    <|start_header_id|>system<|end_header_id|>\n",
-    "    You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \n",
-    "    Below is an instruction that describes a task, paired with an input that provides further context. \n",
-    "    Write a response that appropriately completes the request.\n",
-    "    Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n",
-    "    <|eot_id|>\n",
-    "        <|start_header_id|>user<|end_header_id|>\n",
-    "        {{question}}\n",
-    "    <|eot_id|>\n",
-    "    <|start_header_id|>assistant<|end_header_id|>\n",
-    "    {{complex_cot}}\n",
-    "    {{answer}}\n",
-    "<|eot_id|>\n",
-    "\"\"\"\n",
-    "\n",
-    "# Template dataset to add prompt to each sample\n",
-    "def template_dataset(sample):\n",
-    "    try:\n",
-    "        sample[\"text\"] = PROMPT_TEMPLATE.format(question=sample[\"Question\"],\n",
-    "                                                complex_cot=sample[\"Complex_CoT\"],\n",
-    "                                                answer=sample[\"Response\"])\n",
-    "        return sample\n",
-    "    except KeyError as e:\n",
-    "        print(f\"KeyError in template_dataset: {str(e)}\")\n",
-    "        # Provide default values for missing fields\n",
-    "        missing_key = str(e).strip(\"'\")\n",
-    "        if missing_key == \"Question\":\n",
-    "            sample[\"text\"] = PROMPT_TEMPLATE.format(\n",
-    "                question=\"[Missing question]\",\n",
-    "                complex_cot=sample.get(\"Complex_CoT\", \"[Missing CoT]\"),\n",
-    "                answer=sample.get(\"Response\", \"[Missing response]\")\n",
-    "            )\n",
-    "        elif missing_key == \"Complex_CoT\":\n",
-    "            sample[\"text\"] = PROMPT_TEMPLATE.format(\n",
-    "                question=sample[\"Question\"],\n",
-    "                complex_cot=\"[Missing CoT]\",\n",
-    "                answer=sample.get(\"Response\", \"[Missing response]\")\n",
-    "            )\n",
-    "        elif missing_key == \"Response\":\n",
-    "            sample[\"text\"] = PROMPT_TEMPLATE.format(\n",
-    "                question=sample[\"Question\"],\n",
-    "                complex_cot=sample.get(\"Complex_CoT\", \"[Missing CoT]\"),\n",
-    "                answer=\"[Missing response]\"\n",
-    "            )\n",
-    "        return sample\n",
-    "\n",
-    "PROCESSED_SAMPLE = template_dataset(FINE_TUNING_DATA_SAMPLE)\n",
-    "print(PROCESSED_SAMPLE)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "##### Fine-Tuning Output\n",
-    "The above output shows the templated prompt output to be used for fine-tuning. This pre-processing happens for every record in the fine-tuning dataset before fine-tuning actually takes place. This can be time-consuming for large fine-tuning datasets. We will show in the next lab how to orchestrate this with MLflow."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 5. Model Deployment\n",
+    "### 4. Model Deployment\n",
     "There are several approaches to deploying a model to a SageMaker AI managed endpoint. In this section, we explore the most direct option which downloads a model directly from HuggingFace to the managed endpoint via SageMaker JumpStart. We are still using DeepSeek-R1-Distill-Llama-8B, but we have not fine-tuned it. The purpose of this section is to illustrate the components required to customize a model deployment on SageMaker before fine-tuning it."
    ]
   },
@@ -433,8 +344,7 @@
     "    deserializer=JSONDeserializer()\n",
     ")\n",
     "predictor.predict({\n",
-    "    # \"inputs\": \"Hi, what can you help me with?\",\n",
-    "    \"inputs\": FINE_TUNING_DATA_SAMPLE[\"Question\"],\n",
+    "    \"inputs\": \"Hi, what can you help me with?\",\n",
     "    \"parameters\": {\n",
     "        \"max_new_tokens\": 512,\n",
     "        \"top_p\": 0.9,\n",
@@ -525,7 +435,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6. Qualitative Model Evaluation\n",
+    "### 5. Qualitative Model Evaluation\n",
     "Let's test the default DeepSeek-R1-Distill-Llama-8B using MLFlow's LLM-as-a-Judge capability. We'll use [Anthropic's Claude 3 Haiku](https://www.anthropic.com/news/claude-3-haiku) model on [Amazon Bedrock](https://aws.amazon.com/bedrock/) as the judge. We'll also wrap our model endpoint invocation in a method making it easier to call in the evaluation. \n",
     "\n",
     "This particular endpoint is the [cross-region inference endpoint](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html) name for Claude 3 Haiku.\n",
@@ -993,6 +903,95 @@
     "        print(error_msg)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6. Templating a Prompt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the next workshop we fine-tune DeepSeek-R1-Distill-Llama-8B to become a medical expert. To accomplish this, we execute a fine-tuning job using Managed MLflow on SageMaker AI. We get our data from the [FreedomIntelligence/medical-o1-reasoning-SFT](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) dataset, available on HuggingFace.\n",
+    "\n",
+    "In this lab, we show a small example of what fine-tuning looks like for a single record of the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "FINE_TUNING_DATA_SAMPLE = {\n",
+    "    \"Question\": \"A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\", \n",
+    "    \"Complex_CoT\": \"Okay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem.\\n\\nThe fact that she underwent a Q-tip test is intriguing too. This test is usually done to assess urethral mobility. In stress incontinence, a Q-tip might move significantly, showing urethral hypermobility. This kind of movement often means there's a weakness in the support structures that should help keep the urethra closed during increases in abdominal pressure. So, that's aligning well with stress incontinence.\\n\\nNow, let's think about what would happen during cystometry. Since stress incontinence isn't usually about sudden bladder contractions, I wouldn't expect to see involuntary detrusor contractions during this test. Her bladder isn't spasming or anything; it's more about the support structure failing under stress. Plus, she likely empties her bladder completely because stress incontinence doesn't typically involve incomplete emptying. So, her residual volume should be pretty normal.\\n\\nAll in all, it seems like if they do a cystometry on her, it will likely show a normal residual volume and no involuntary contractions. Yup, I think that makes sense given her symptoms and the typical presentations of stress urinary incontinence.\",\n",
+    "    \"Response\": \"Cystometry in this case of stress urinary incontinence would most likely reveal a normal post-void residual volume, as stress incontinence typically does not involve issues with bladder emptying. Additionally, since stress urinary incontinence is primarily related to physical exertion and not an overactive bladder, you would not expect to see any involuntary detrusor contractions during the test.\"\n",
+    "}\n",
+    "\n",
+    "\n",
+    "PROMPT_TEMPLATE = \"\"\"\n",
+    "<|begin_of_text|>\n",
+    "    <|start_header_id|>system<|end_header_id|>\n",
+    "    You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \n",
+    "    Below is an instruction that describes a task, paired with an input that provides further context. \n",
+    "    Write a response that appropriately completes the request.\n",
+    "    Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n",
+    "    <|eot_id|>\n",
+    "        <|start_header_id|>user<|end_header_id|>\n",
+    "        {{question}}\n",
+    "    <|eot_id|>\n",
+    "    <|start_header_id|>assistant<|end_header_id|>\n",
+    "    {{complex_cot}}\n",
+    "    {{answer}}\n",
+    "<|eot_id|>\n",
+    "\"\"\"\n",
+    "\n",
+    "# Template dataset to add prompt to each sample\n",
+    "def template_dataset(sample):\n",
+    "    try:\n",
+    "        sample[\"text\"] = PROMPT_TEMPLATE.format(question=sample[\"Question\"],\n",
+    "                                                complex_cot=sample[\"Complex_CoT\"],\n",
+    "                                                answer=sample[\"Response\"])\n",
+    "        return sample\n",
+    "    except KeyError as e:\n",
+    "        print(f\"KeyError in template_dataset: {str(e)}\")\n",
+    "        # Provide default values for missing fields\n",
+    "        missing_key = str(e).strip(\"'\")\n",
+    "        if missing_key == \"Question\":\n",
+    "            sample[\"text\"] = PROMPT_TEMPLATE.format(\n",
+    "                question=\"[Missing question]\",\n",
+    "                complex_cot=sample.get(\"Complex_CoT\", \"[Missing CoT]\"),\n",
+    "                answer=sample.get(\"Response\", \"[Missing response]\")\n",
+    "            )\n",
+    "        elif missing_key == \"Complex_CoT\":\n",
+    "            sample[\"text\"] = PROMPT_TEMPLATE.format(\n",
+    "                question=sample[\"Question\"],\n",
+    "                complex_cot=\"[Missing CoT]\",\n",
+    "                answer=sample.get(\"Response\", \"[Missing response]\")\n",
+    "            )\n",
+    "        elif missing_key == \"Response\":\n",
+    "            sample[\"text\"] = PROMPT_TEMPLATE.format(\n",
+    "                question=sample[\"Question\"],\n",
+    "                complex_cot=sample.get(\"Complex_CoT\", \"[Missing CoT]\"),\n",
+    "                answer=\"[Missing response]\"\n",
+    "            )\n",
+    "        return sample\n",
+    "\n",
+    "PROCESSED_SAMPLE = template_dataset(FINE_TUNING_DATA_SAMPLE)\n",
+    "print(PROCESSED_SAMPLE)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "##### Fine-Tuning Output\n",
+    "The above output shows the templated prompt output to be used for fine-tuning. This pre-processing happens for every record in the fine-tuning dataset before fine-tuning actually takes place. This can be time-consuming for large fine-tuning datasets. We will show in the next lab how to orchestrate this with MLflow."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},