Add automated S3 workset monitor #161

iamh2o · 2025-10-16T09:50:39Z

Summary

add an S3 workset monitor module that handles sentinel locking, staging, cluster lifecycle, pipeline execution, and exports results to complete/error prefixes
expose the workflow via a new bin/daylily-monitor-worksets CLI and document configuration/usage in docs/workset-monitor.md
update the README to advertise the monitor and declare boto3/PyYAML runtime dependencies

Testing

python -m compileall daylib/workset_monitor.py bin/daylily-monitor-worksets

https://chatgpt.com/codex/tasks/task_e_68f0bb25aab88331b3027151f0f5c0fc

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

@codex fix this CI failure
@codex address that feedback

chatgpt-codex-connector · 2025-10-16T09:53:59Z

daylib/workset_monitor.py

+        LOGGER.info("Validating stage samples manifest %s", samples_path)
+        with samples_path.open("r", encoding="utf-8") as handle:
+            reader = csv.DictReader(handle, delimiter="\t")
+            for row in reader:
+                for value in row.values():
+                    if not value:
+                        continue
+                    if value.startswith("s3://"):
+                        self._assert_s3_object_exists(value)
+                    else:
+                        candidate = samples_path.parent / value
+                        if not candidate.exists():
+                            raise MonitorError(f"Referenced file does not exist: {candidate}")


Restrict sample manifest validation to path columns

The new _validate_stage_samples helper treats every nonempty TSV field as a file path and requires it either to exist locally or in S3. Real workset manifests typically contain many metadata columns (sample IDs, library names, read group labels, etc.) whose values are not paths, so this logic raises MonitorError for legitimate manifests by looking for files like …/S001 or …/Tumor. The monitor will therefore fail before staging for most worksets. Validation should be limited to columns known to contain paths instead of applying it to all values.

Useful? React with 👍 / 👎.

Add automated S3 workset monitor

95ea926

iamh2o added the codex label Oct 16, 2025 — with ChatGPT Codex Connector

chatgpt-codex-connector bot reviewed Oct 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add automated S3 workset monitor #161

Add automated S3 workset monitor #161

Uh oh!

iamh2o commented Oct 16, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add automated S3 workset monitor #161

Are you sure you want to change the base?

Add automated S3 workset monitor #161

Uh oh!

Conversation

iamh2o commented Oct 16, 2025

Summary

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants