-
Notifications
You must be signed in to change notification settings - Fork 555
Add BHCToAVS model for patient-friendly summaries #730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| pyhealth.models.bhc_to_avs | ||
| ========================== | ||
|
|
||
| BHCToAVS | ||
| ------------------------------ | ||
|
|
||
| .. autoclass:: pyhealth.models.bhc_to_avs.BHCToAVS | ||
| :members: | ||
| :inherited-members: | ||
| :show-inheritance: | ||
| :undoc-members: |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| from pyhealth.models.bhc_to_avs import BHCToAVS | ||
|
|
||
| # Initialize the model | ||
| model = BHCToAVS() | ||
|
|
||
| # Example Brief Hospital Course (BHC) text with common clinical abbreviations generated synthetically via ChatGPT 5.1 | ||
| bhc = ( | ||
| "Pt admitted with acute onset severe epigastric pain and hypotension. " | ||
| "Labs notable for elevated lactate, WBC 18K, mild AST/ALT elevation, and Cr 1.4 (baseline 0.9). " | ||
| "CT A/P w/ contrast demonstrated peripancreatic fat stranding c/w acute pancreatitis; " | ||
| "no necrosis or peripancreatic fluid collection. " | ||
| "Pt received aggressive IVFs, electrolyte repletion, IV analgesia, and NPO status initially. " | ||
| "Serial abd exams remained benign with no rebound or guarding. " | ||
| "BP stabilized, lactate downtrended, and pt tolerated ADAT to low-fat diet without recurrence of sx. " | ||
| "Discharged in stable condition w/ instructions for GI f/u and outpatient CMP in 1 week." | ||
| ) | ||
|
|
||
| # Generate a patient-friendly After-Visit Summary | ||
| print(model.predict(bhc)) | ||
|
|
||
| # Expected output: A simplified, patient-friendly summary explaining the hospital stay without medical jargon. |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,98 @@ | ||||||||||||||||||||
| # Author: Charan Williams | ||||||||||||||||||||
| # NetID: charanw2 | ||||||||||||||||||||
| # Description: Converts clinical brief hospital course (BHC) data to after visit summaries using a fine-tuned Mistral 7B model. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| from typing import Dict, Any | ||||||||||||||||||||
|
||||||||||||||||||||
| from typing import Dict, Any |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _PROMPT variable is defined twice (lines 12-18 and lines 27-31), with the second definition overwriting the first. This creates dead code and potential confusion. Only one prompt definition should be kept, or they should be renamed to reflect their different purposes (e.g., _TRAINING_PROMPT and _INFERENCE_PROMPT).
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataclass decorator on a class inheriting from BaseModel (which inherits from nn.Module) may not properly initialize the parent class. The dataclass-generated init should include a post_init method that calls super().init() to ensure nn.Module is properly initialized. Without this, features like the _dummy_param used for device detection may not work correctly.
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for the BHCToAVS class itself. The class lacks a docstring explaining its purpose, parameters, and usage. Only the individual fields and methods have documentation.
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipeline is created with device_map="auto" parameter twice: once in the AutoModelForCausalLM.from_pretrained call (line 48) and again in the pipeline constructor (line 65). The second device_map parameter in the pipeline call is redundant since the model has already been placed on devices, and may cause conflicts or unexpected behavior.
| device_map="auto", |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing input validation for the bhc_text parameter. The method should validate that bhc_text is not None and is a non-empty string before processing to provide clearer error messages to users.
| # Validate input to provide clear error messages and avoid unexpected failures. | |
| if bhc_text is None: | |
| raise ValueError("bhc_text must not be None.") | |
| if not isinstance(bhc_text, str): | |
| raise TypeError(f"bhc_text must be a string, got {type(bhc_text).__name__}.") | |
| if not bhc_text.strip(): | |
| raise ValueError("bhc_text must be a non-empty string.") |
Copilot
AI
Dec 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipeline is missing the return_full_text=False parameter in the generate call. By default, Hugging Face text-generation pipelines return the full text including the input prompt. To return only the newly generated text, you should either set return_full_text=False in the pipeline call or manually strip the prompt from the output.
| pad_token_id=pipe.tokenizer.eos_token_id, | |
| pad_token_id=pipe.tokenizer.eos_token_id, | |
| return_full_text=False, |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| from tests.base import BaseTestCase | ||
| from pyhealth.models.bhc_to_avs import BHCToAVS | ||
|
|
||
|
|
||
| class TestBHCToAVS(BaseTestCase): | ||
| """Unit tests for the BHCToAVS model.""" | ||
|
|
||
| def setUp(self): | ||
| self.set_random_seed() | ||
|
|
||
| def test_predict(self): | ||
| """Test the predict method of BHCToAVS.""" | ||
| bhc_text = ( | ||
| "Patient admitted with abdominal pain. Imaging showed no acute findings. " | ||
| "Pain improved with supportive care and the patient was discharged in stable condition." | ||
| ) | ||
| model = BHCToAVS() | ||
| try: | ||
|
|
||
| summary = model.predict(bhc_text) | ||
|
|
||
| # Output must be type str | ||
| self.assertIsInstance(summary, str) | ||
|
|
||
| # Output should not be empty | ||
| self.assertGreater(len(summary.strip()), 0) | ||
|
|
||
| # Output should be different from input | ||
| self.assertNotIn(bhc_text[:40], summary) | ||
|
|
||
| except OSError as e: | ||
| # Allow test to pass if model download fails on e.g. on GitHub workflows | ||
Logiquo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| if "gated repo" in str(e).lower() or "404" in str(e): | ||
| pass | ||
| else: | ||
| raise e | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The module docstring header uses "# Description:" format which is not standard Python docstring style. The description should either be a proper module-level docstring (triple-quoted string) or follow a consistent comment format without the "Description:" label.