Skip to content

Conversation

@iamh2o
Copy link
Contributor

@iamh2o iamh2o commented Oct 16, 2025

Summary

  • add an S3 workset monitor module that handles sentinel locking, staging, cluster lifecycle, pipeline execution, and exports results to complete/error prefixes
  • expose the workflow via a new bin/daylily-monitor-worksets CLI and document configuration/usage in docs/workset-monitor.md
  • update the README to advertise the monitor and declare boto3/PyYAML runtime dependencies

Testing

  • python -m compileall daylib/workset_monitor.py bin/daylily-monitor-worksets

https://chatgpt.com/codex/tasks/task_e_68f0bb25aab88331b3027151f0f5c0fc

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Comment on lines +493 to +505
LOGGER.info("Validating stage samples manifest %s", samples_path)
with samples_path.open("r", encoding="utf-8") as handle:
reader = csv.DictReader(handle, delimiter="\t")
for row in reader:
for value in row.values():
if not value:
continue
if value.startswith("s3://"):
self._assert_s3_object_exists(value)
else:
candidate = samples_path.parent / value
if not candidate.exists():
raise MonitorError(f"Referenced file does not exist: {candidate}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict sample manifest validation to path columns

The new _validate_stage_samples helper treats every nonempty TSV field as a file path and requires it either to exist locally or in S3. Real workset manifests typically contain many metadata columns (sample IDs, library names, read group labels, etc.) whose values are not paths, so this logic raises MonitorError for legitimate manifests by looking for files like …/S001 or …/Tumor. The monitor will therefore fail before staging for most worksets. Validation should be limited to columns known to contain paths instead of applying it to all values.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants