Skip to content

Commit 196cd27

Browse files
committed
fix: don't force pass seed to llm service and default alignment threshold to 0
# Conflicts: # changelog.md
1 parent 1299ec1 commit 196cd27

File tree

2 files changed

+12
-3
lines changed

2 files changed

+12
-3
lines changed

changelog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22

33
## Unreleased
44

5+
### Fixed
6+
7+
- Don't pass seed to openai API calls (only as extra body)
8+
- Default to alignment threshold = 0 (better recall) for LLM annotated markup alignment with the original text
9+
510
### Changed
611

712
- :explosion: EDS-NLP now requires Python 3.10 or later.

edsnlp/pipes/llm/llm_markup_extractor/llm_markup_extractor.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,8 @@ def prompt(doc_text, examples):
196196
The markup format to use when formatting the few-shot examples and
197197
parsing the model's output. Either "xml" (default) or "md" (Markdown).
198198
Make sure the prompt template matches the chosen format.
199+
alignment_threshold : float
200+
The threshold used to align the model's output with the original text.
199201
prompt : Union[str, Callable[[str, List[Tuple[str, str]]], List[Dict[str, str]]]]
200202
The prompt is the main way to control the model's behavior.
201203
It can be either:
@@ -262,6 +264,7 @@ def __init__(
262264
str, Callable[[str, List[Tuple[str, str]]], List[Dict[str, str]]]
263265
],
264266
markup_mode: Literal["xml", "md"] = "xml",
267+
alignment_threshold: float = 0.0,
265268
examples: Iterable[Doc] = (),
266269
max_few_shot_examples: int = -1,
267270
use_retriever: Optional[bool] = None,
@@ -301,7 +304,9 @@ def __init__(
301304
self.api_kwargs = api_kwargs or {}
302305
self.max_concurrent_requests = max_concurrent_requests
303306
self.on_error = on_error
304-
self.seed = seed
307+
self.alignment_threshold = alignment_threshold
308+
if seed is not None:
309+
api_kwargs["seed"] = seed
305310
self.retriever = None
306311
if self.max_few_shot_examples > 0 and use_retriever is not False:
307312
self.build_few_shot_retriever_(self.examples)
@@ -335,6 +340,7 @@ def apply_markup_to_doc_(self, doclike: Any, markup_answer: str):
335340
aligned = align(
336341
{"text": res_text, "entities": ents},
337342
{"text": stripped_text, "entities": []},
343+
threshold=self.alignment_threshold,
338344
)
339345
res_ents = [
340346
(f["begin"], f["end"], e["label"], e["attributes"])
@@ -504,7 +510,6 @@ def _llm_request_sync(self, messages) -> str:
504510
response = self.client.chat.completions.create(
505511
model=self.model,
506512
messages=messages,
507-
seed=self.seed,
508513
**self.api_kwargs,
509514
)
510515
return response.choices[0].message.content
@@ -514,7 +519,6 @@ async def _coro():
514519
response = await self.async_client.chat.completions.create(
515520
model=self.model,
516521
messages=messages,
517-
seed=self.seed,
518522
**self.api_kwargs,
519523
)
520524
return response.choices[0].message.content

0 commit comments

Comments
 (0)