Skip to content

Conversation

@Logiquo
Copy link
Collaborator

@Logiquo Logiquo commented Dec 24, 2025

Contributor: Yongda Fan (yongdaf2@illinois.edu)

Contribution Type: Dataset

Description
Furthur optimization on multi-worker efficiency on task transformation. The runtime can reduce from #748's 2h to 20min.

Files to Review
pyhealth/datasets/base_dataset.py

@Logiquo Logiquo added the core Core functionality (Patient API, BaseDataset, event stream format, etc.) label Dec 24, 2025
Copy link
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, looks like there's some merge conflicts. 748 has been merged, and I think it works pretty well so far.

But it seems there's a conflict still with this PR for whatever reason.

@Logiquo Logiquo force-pushed the batch-task-transformation branch from f2ab37a to af96784 Compare December 24, 2025 22:34
@Logiquo
Copy link
Collaborator Author

Logiquo commented Dec 24, 2025

Ya, it requres some rebase, i've rebased it.

Copy link
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, will definitely need to fix the scratch_dir stuff as it leads to weird issues.

@jhnwu3 jhnwu3 merged commit 1b2aca3 into sunlabuiuc:master Dec 26, 2025
1 check passed
@Logiquo Logiquo deleted the batch-task-transformation branch December 27, 2025 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core functionality (Patient API, BaseDataset, event stream format, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants