-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Benchmarking #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Add write_result to common library exports - Fix table provider constructor signatures: - VCF: Added missing info_fields and format_fields parameters - FASTQ: Changed from new() to try_new() - BED: Added BEDFields::BED3 parameter - FASTA: Added missing thread_num parameter - Fix chrono serde feature dependency - Fix generic type parameter cycle in time_operation() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Change if condition from 'matrix.enabled == true' to '${{ matrix.enabled == 'true' }}'
- Fixes workflow file issue that prevented benchmark workflow from running
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Split benchmark job into benchmark-linux and benchmark-macos - Remove problematic matrix.enabled conditional logic - Use job-level if conditions with prepare job outputs - Add if: always() to aggregate job to run even when jobs are skipped - Fixes workflow file validation error 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Update benchmark workflow to use Rust 1.86.0 - Update rust-version in benchmark crate Cargo.toml files - Fixes build error: datafusion 50.3.0 requires rustc 1.86.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Change Max(String) to Max(()) to avoid unused field warning - Prevents build failure when -D warnings is enabled in CI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add #[allow(dead_code)] to suppress unused field warning - Properly deserialize 'max' string from YAML configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add ObjectStorageOptions to GFF table provider in benchmark runner - Update benchmark common library imports to follow formatting standards - Update Claude Code settings with additional approved commands 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Alphabetize imports in benchmark modules to pass CI formatting checks: - data_downloader.rs: Order anyhow imports alphabetically - lib.rs: Reorder pub use statements - main.rs: Reorder datafusion_bio_benchmarks_common imports 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Apply cargo fmt to reorder imports alphabetically in benchmark modules: - data_downloader.rs: Reorder anyhow imports - lib.rs: Reorder pub use statements - main.rs: Reorder datafusion_bio_benchmarks_common imports This resolves CI formatting check failures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove incompatible settings that only work with nightly Rust: - Remove required_version = "1.8.0" - Remove unstable_features = false Add edition = "2021" to match the project's Rust edition. This fixes the pre-commit hook warnings and ensures consistent formatting behavior across stable and nightly toolchains. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
c5bf1c9 to
ae04d9f
Compare
Add comprehensive benchmark framework following polars-bio architecture
with complete separation of concerns between benchmark execution and
report generation.
Key features:
- Dual benchmark execution (baseline + target)
- Separate workflows for benchmarks and report generation
- GitHub Pages integration with structured data storage
- Interactive comparison report with dropdown menus
- Configuration-driven benchmark runner (YAML)
- Support for all file formats (GFF, VCF, FASTQ, BAM, BED, FASTA)
Architecture:
- benchmark.yml: Execute benchmarks, store raw JSON
- pages.yml: Generate HTML reports from stored data
- Python scripts: Interactive comparison tool
- Documentation: Complete setup and usage guides
Data structure (polars-bio compatible):
benchmark-data/
tags/{version}/{platform}/{baseline|target}/results/*.json
commits/{sha}/{platform}/{baseline|target}/results/*.json
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
…ate directories - Add check for benchmarks directory existence in baseline tag - Skip baseline benchmarks if directory doesn't exist (e.g., v0.1.1) - Create DEST_BASE directory before writing benchmark-info.json - Fixes exit code 101 (missing package) and exit code 1 (missing directory)
- Trigger benchmarks automatically on PRs - Auto-comment on PRs with benchmark results - Default to 'fast' mode and 'all' platforms for PRs - Filter to only run when relevant files change
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.github.io/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.github.io/datafusion-bio-formats/benchmark-data/ |
- For PRs, github.ref_name is '29/merge' which doesn't exist - Use github.head_ref instead to get actual branch name - Fixes 'pathspec did not match' error
- Modified baseline benchmark logic to ALWAYS run by copying current benchmark framework to baseline tag checkout - This ensures baseline comparisons work even when baseline tag doesn't have benchmarks directory - Fixed GitHub Pages URLs to use biodatageeks.org instead of .github.io - Updated URLs in workflow PR comment, README, and benchmarks/README 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
Also copy Cargo.toml to baseline tag checkout so workspace knows about benchmark crates. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.github.io/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.github.io/datafusion-bio-formats/benchmark-data/ |
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
Reset Cargo.lock changes after baseline build to avoid conflicts when checking out target branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
Generate interactive comparison HTML directly in the aggregate job and commit it to gh-pages alongside benchmark data. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implement polars-bio-style caching strategy: - Add sccache for distributed compiler caching - Separate cargo registry and target caches - Enable incremental compilation (CARGO_INCREMENTAL=1) - Use granular cache keys based on Cargo.lock and source files This should significantly speed up subsequent benchmark runs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
1 similar comment
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
Updates to benchmarks/python/generate_interactive_comparison.py: - Add optgroup dropdowns separating tags and commits - Auto-select latest tag as baseline, latest commit as target - Implement functional platform tabs (Linux/macOS) - Add dynamic data loading from benchmark-data JSON files - Implement Plotly chart generation for benchmark comparisons - Add proper error handling for missing data - Match polars-bio's UX patterns for benchmark comparison The interactive page now: - Only shows available datasets in dropdowns - Dynamically fetches and displays benchmark results - Supports switching between platforms via tabs - Generates grouped bar charts comparing baseline vs target 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Major changes to align with polars-bio benchmark framework:
1. **Storage Structure**:
- Remove baseline/target subdirectories
- Store each dataset standalone: tags/{TAG}/{platform}/results/
- Store commits as: commits/{SHORT_SHA}/{platform}/results/
2. **Index Generation**:
- Generate proper index.json with datasets array
- Include tags array and latest_tag
- Each dataset has: id, label, ref, ref_type, timestamp, runner, path, commit_sha
3. **Metadata**:
- Create metadata.json for each dataset (not benchmark-info.json)
- Consistent structure across tags and commits
4. **Baseline Handling**:
- Store baseline tag as standalone entry in tags/
- Both baseline and target appear independently in index
- No nested baseline/target structure
This matches polars-bio's proven architecture for easier comparison
and better dropdown organization.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major changes to match polars-bio's client-side data loading: 1. **Load from index.json**: Read structured index with datasets array 2. **Organize by refs**: Group datasets by ref (tag or branch name) 3. **Dropdown logic**: Populate from REFS object, separate tags and commits 4. **Latest tag marker**: Show ⭐ for latest_tag 5. **Data loading**: Load benchmark data using ref keys and runner paths This completes the refactor to match polars-bio's proven architecture. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
2 similar comments
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
The jq commands were failing because "label" is a reserved keyword in jq.
Renamed the jq variable from $label to $runnerlabel to avoid the conflict.
Error was:
jq: error: syntax error, unexpected label, expecting IDENT or __loc__
runner_label: $label,
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Implement organize_datasets_by_ref() matching polars-bio's structure - Use refs_by_type with separate "tag" and "branch" dicts - Support unique keys for branch commits (ref@sha format) - Use cloneNode(true) for dropdown optgroups like polars-bio - Clean implementation with proper data flow - Remove all old/broken code - Add TODOs for benchmark result parsing This aligns our HTML generation with polars-bio's proven architecture while maintaining compatibility with our GFF benchmark format. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
1 similar comment
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
- Check if DATA.refs_by_type exists before accessing - Use ternary operators to handle missing tag/branch objects - Prevents "Cannot read properties of undefined" errors - Fixes error when index.json exists but is empty/incomplete 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
This commit completes the benchmark comparison framework by: 1. **Fix dataset loading bug**: Modified load_dataset_results() to always return dataset structure even when result directories don't exist. This ensures the UI has essential metadata (runner_label, etc.) from index.json. 2. **Load actual benchmark results**: Extended load_dataset_results() to scan and load benchmark JSON files from results/ directories, organizing them by category (parallelism, predicate, projection). 3. **Implement chart generation**: Replaced placeholder chart code with actual Plotly bar charts that compare baseline vs target elapsed times for each benchmark category. Features: - Interactive dropdowns for selecting baseline and target versions - Platform tabs for switching between Linux/macOS results - Grouped bar charts showing elapsed time comparisons - Automatic chart generation for all benchmark categories - Proper error handling when results are missing Testing: - Verified with Playwright automated testing - Confirmed 3 charts render correctly (parallelism, predicate, projection) - Screenshot captured showing working comparison interface 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit enhances the benchmark comparison interface with: 1. **Latest tag indicator**: Added ⭐ star symbol to the latest tag in dropdowns for easy identification 2. **Format subtabs**: Implemented file format subtabs (GFF, VCF, etc.) within each platform tab to organize benchmarks by format type 3. **Data reorganization**: Updated load_dataset_results() to organize results by format first, then category (format -> category -> benchmarks) 4. **State management**: Added currentFormat and availableFormats to track selected format across platform switches 5. **Format tab switching**: Implemented setupFormatTabs() and switchFormat() functions with proper active state handling 6. **Styling**: Added CSS for format tabs with blue active state and hover effects Features: - Platform tabs (Linux/macOS) at top level - Format subtabs (GFF, VCF, etc.) below platform tabs - Charts filtered by both platform and format - Automatic format detection from benchmark results - Seamless tab switching maintains state correctly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit fixes two issues with the benchmark comparison UI: 1. **Star indicator now visible**: Fixed workflow to properly mark datasets with is_latest_tag: true. After updating latest_tag in index.json, the workflow now iterates through all datasets and marks those matching the latest tag. 2. **Commits sorted by date**: Branch/commit entries in dropdown are now sorted by timestamp descending (most recent first), making it easy to compare the latest commits. Changes: - Workflow: Added jq command to mark datasets with is_latest_tag: true - Python: Added timestamp field to organize_datasets_by_ref() - JavaScript: Added .sort() by timestamp when populating branch dropdown - Branches now appear in chronological order (newest first) Testing: - Star will appear as "v0.1.1 ⭐ Latest" after next workflow run - Commits ordered by date instead of arbitrary order 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
1 similar comment
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
- Changed is_latest_tag logic to only update matching datasets - Previously set all non-matching datasets to false, overwriting true values - Now leaves non-matching datasets unchanged - Added is_latest_tag field to target dataset creation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
1 similar comment
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
- Add sccache for both Linux and macOS jobs - Enable sccache with RUSTC_WRAPPER and SCCACHE_GHA_ENABLED - Remove cargo clean - reuse baseline artifacts for target build - Set CARGO_INCREMENTAL=0 (recommended with sccache) - Rename "Clean Build Artifacts" to "Reset Cargo.lock" This allows target builds to reuse compiled artifacts from baseline, dramatically reducing build times when baseline and target code overlap. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
1 similar comment
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
- Move latest_tag update outside the "if target is tag" block - Now updates latest_tag even when benchmarking branches - Ensures star appears for latest tag in dropdown Previously only updated when target was a tag, causing latest_tag to be null when comparing branch to tag baseline. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
Implemented comprehensive mobile optimizations for the interactive benchmark comparison page to improve usability across all device sizes. Key improvements: - Mobile-first responsive design with 3 breakpoints (480px, 768px, desktop) - Fixed dropdown overflow issues on small screens - Stacked layout for selection controls on phones - Full-width buttons with 44px minimum touch targets - Wrappable/scrollable tabs for better mobile navigation - Responsive Plotly chart configuration with dynamic margins - Touch-friendly focus states and animations - Optimized typography and spacing for readability Tested on: - iPhone SE (375x667) - iPhone 11 Pro (414x896) - iPad Portrait (768x1024) - iPad Landscape (1024x768) - Desktop (1920x1080) No breaking changes - fully backward compatible with existing layout. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
Both Linux and macOS builds are failing with sccache errors: 'Server startup failed: cache storage failed to read: Unexpected (permanent)' This is a GitHub Actions infrastructure issue, not our code. Temporarily disabling RUSTC_WRAPPER to allow builds to proceed without sccache. Will re-enable once GitHub's cache service is restored. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
1 similar comment
📊 Benchmark ResultsBenchmarks have been completed and stored for this PR. View Results: https://biodatageeks.org/datafusion-bio-formats/benchmark-comparison/
Raw data: https://biodatageeks.org/datafusion-bio-formats/benchmark-data/ |
No description provided.