Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/automerge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ name: Python application
on:
push:

branches: [ "main", "279-add-a-jupyter-notebook-for-llm-training" ]
branches: [ "main", "281-fix-misspelling-in-llm-from-scratch-jupyter" ]


permissions:
Expand Down
70 changes: 35 additions & 35 deletions 2025_11_23_demo_train_an_llm_with_cerebros.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
Expand Down Expand Up @@ -91,7 +91,7 @@
"id": "AcECFSs7WVsi",
"outputId": "9fd59935-35d4-4a08-9c8a-fb01fd3e4f03"
},
"execution_count": 25,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -121,7 +121,7 @@
"id": "mCpJGfD2WfLj",
"outputId": "e0fe8c05-6154-41cd-f489-08cfd2ad0fa8"
},
"execution_count": 26,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -159,7 +159,7 @@
"id": "nwElyEdpW90P",
"outputId": "170e2158-b7a9-49f0-ce63-22c4c7410f33"
},
"execution_count": 27,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -403,7 +403,7 @@
"id": "ubtKyfBQzFEW",
"outputId": "6cbe44e6-3ce7-4227-982a-88d0d36d2205"
},
"execution_count": 1,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -431,7 +431,7 @@
"id": "NemXTsYgfE0s",
"outputId": "ca92342f-1f82-42ee-8562-980b1c8dd849"
},
"execution_count": 2,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand All @@ -455,7 +455,7 @@
"id": "D3K4dSVQhrIc",
"outputId": "5a45fa94-1bb3-46ce-c362-27f456221fd6"
},
"execution_count": 3,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -509,7 +509,7 @@
"id": "WKCdCv96X4YX",
"outputId": "875f6626-4f4b-426c-c697-da9f186e440a"
},
"execution_count": 4,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -557,7 +557,7 @@
"metadata": {
"id": "vywbZQxAZC9R"
},
"execution_count": 5,
"execution_count": null,
"outputs": []
},
{
Expand Down Expand Up @@ -608,7 +608,7 @@
},
"outputId": "6c85d1ae-52f4-4ddf-d768-ea5781b1b7da"
},
"execution_count": 6,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -681,13 +681,13 @@
"metadata": {
"id": "Wbowkxnbc4Zd"
},
"execution_count": 7,
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# Phase I-b (Extended Training) Hyperparameters\n",
"# Stage I-b (Extended Training) Hyperparameters\n",
"\n",
"These parameters are for fine-tuning the best model from Stage I-a.\n",
"\n",
Expand Down Expand Up @@ -716,7 +716,7 @@
"metadata": {
"id": "-znwaddIdiKU"
},
"execution_count": 8,
"execution_count": null,
"outputs": []
},
{
Expand All @@ -741,7 +741,7 @@
"metadata": {
"id": "JHjCz9qXd5Gq"
},
"execution_count": 9,
"execution_count": null,
"outputs": []
},
{
Expand Down Expand Up @@ -774,7 +774,7 @@
},
"outputId": "d46f8e34-3d7d-4fb4-dddc-bf1c45bae7ee"
},
"execution_count": 10,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -930,7 +930,7 @@
"metadata": {
"id": "EDyuTMLufYvs"
},
"execution_count": 11,
"execution_count": null,
"outputs": []
},
{
Expand Down Expand Up @@ -958,7 +958,7 @@
"metadata": {
"id": "SMSdkFRPkg7D"
},
"execution_count": 12,
"execution_count": null,
"outputs": []
},
{
Expand All @@ -973,7 +973,7 @@
"id": "Oqw-T7bOo1GD",
"outputId": "2e8f24fc-24c2-4a06-babb-550b676b7751"
},
"execution_count": 13,
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
Expand Down Expand Up @@ -1001,7 +1001,7 @@
"id": "Hv_52izIjOQ7",
"outputId": "e2972924-0190-4f16-9317-c00100486203"
},
"execution_count": 14,
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
Expand Down Expand Up @@ -1149,7 +1149,7 @@
},
"outputId": "e76e091c-6e7f-4820-ef79-15143f1e6b64"
},
"execution_count": 15,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -1372,7 +1372,7 @@
"metadata": {
"id": "_8uTBW_to7iQ"
},
"execution_count": 16,
"execution_count": null,
"outputs": []
},
{
Expand Down Expand Up @@ -1474,7 +1474,7 @@
"metadata": {
"id": "XV2q_5WEwBJ0"
},
"execution_count": 17,
"execution_count": null,
"outputs": []
},
{
Expand Down Expand Up @@ -1510,7 +1510,7 @@
},
"outputId": "d56dd1ec-2f7b-4a3c-ecc6-75e595910367"
},
"execution_count": 18,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -4269,7 +4269,7 @@
},
"outputId": "d253eeeb-831e-48ce-f256-c8f10540064a"
},
"execution_count": 19,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -4348,7 +4348,7 @@
"metadata": {
"id": "f8XigcJcykLn"
},
"execution_count": 20,
"execution_count": null,
"outputs": []
},
{
Expand Down Expand Up @@ -4534,7 +4534,7 @@
},
"outputId": "e05a9fb1-706e-4f26-e668-825f7df940c2"
},
"execution_count": 21,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -4968,7 +4968,7 @@
{
"cell_type": "markdown",
"source": [
"# Syage I-b: Extended Training\n",
"# Stage I-b: Extended Training\n",
"\n",
"- Now, we take the best model from Stage I-a and continue training it on a larger dataset.\n",
"- This uses a streaming `tf.data.Dataset` generator to allow handling of much larger data sets without using more RAM.\n",
Expand Down Expand Up @@ -5119,7 +5119,7 @@
"metadata": {
"id": "MHWWE0xIzLRD"
},
"execution_count": 22,
"execution_count": null,
"outputs": []
},
{
Expand All @@ -5135,7 +5135,7 @@
"id": "HxwyQzSppQwp",
"outputId": "89a48aa5-c364-4057-98c4-fc4a291f448e"
},
"execution_count": 23,
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
Expand Down Expand Up @@ -5190,7 +5190,7 @@
"cell_type": "markdown",
"source": [
"\n",
"## Model Compilation for Phase I-b\n",
"## Model Compilation for Stage I-b\n",
"\n",
"- We recompile the model with the same base optimizer (AdamW), however this time with a custom learning rate scheduler (WarmupCosineDecayRestarts), and for disambiguation, relevant metrics for this training phase. We also add an EarlyStopping callback which is mainly being used to restore the weights from the best epoch, if that turns out to not be the last epoch.\n",
"\n",
Expand Down Expand Up @@ -5328,7 +5328,7 @@
"metadata": {
"id": "GGkEVa2dzOtf"
},
"execution_count": 24,
"execution_count": null,
"outputs": []
},
{
Expand Down Expand Up @@ -5376,7 +5376,7 @@
},
"outputId": "0daf05b2-7072-4818-8b47-a05558b33470"
},
"execution_count": 25,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -5567,7 +5567,7 @@
},
"outputId": "8071bc5a-8520-4d13-82e1-cbd941297b4b"
},
"execution_count": 26,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -6453,7 +6453,7 @@
},
"outputId": "37a1153f-09a0-4274-9ca2-e280112e65e6"
},
"execution_count": 27,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down Expand Up @@ -6503,7 +6503,7 @@
},
"outputId": "389fe0bf-c935-4f49-dd4f-8eea8672c634"
},
"execution_count": 28,
"execution_count": null,
"outputs": [
{
"output_type": "stream",
Expand Down
Loading