Skip to content

Commit e19a372

Browse files
authored
Fix llm as a judge cookbook images (#1517)
1 parent fe2bbf1 commit e19a372

File tree

5 files changed

+3
-3
lines changed

5 files changed

+3
-3
lines changed

examples/Custom-LLM-as-a-Judge.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -499,7 +499,7 @@
499499
"It looks like the numeric rater scored almost 94% in total. That's not bad, but if 6% of your evals are incorrectly judged, that could make it very hard to trust them. Let's dig into the Braintrust\n",
500500
"UI to get some insight into what's going on.\n",
501501
"\n",
502-
"![Partial credit](../images/Custom-LLM-as-a-Judge/Partial-Credit.gif)\n",
502+
"![Partial credit](../images/Custom-LLM-as-a-Judge-Partial-Credit.gif)\n",
503503
"\n",
504504
"It looks like a number of the incorrect answers were scored with numbers between 1 and 10. However, we do not currently have any insight into why the model gave these scores. Let's see if we can\n",
505505
"fix that next.\n"
@@ -670,11 +670,11 @@
670670
"It doesn't look like adding reasoning helped the score (in fact, it's half a percent worse). However, if we look at one of the failures, we'll get some insight into\n",
671671
"what the model was thinking. Here is an example of a hallucinated answer:\n",
672672
"\n",
673-
"![Output](../images/Custom-LLM-as-a-Judge/Output.png)\n",
673+
"![Output](../images/Custom-LLM-as-a-Judge-Output.png)\n",
674674
"\n",
675675
"And the score along with its reasoning:\n",
676676
"\n",
677-
"![Reasoning](../images/Custom-LLM-as-a-Judge/Reasoning.png)\n"
677+
"![Reasoning](../images/Custom-LLM-as-a-Judge-Reasoning.png)\n"
678678
]
679679
},
680680
{
File renamed without changes.
-191 KB
Binary file not shown.

0 commit comments

Comments
 (0)