Skip to content

Commit b2ecdd4

Browse files
authored
feat: Add coverage method (#112)
* feat: Coverage * Add coverage tests * LaztFrame alias * chore: Version bump * chore: Docs update
1 parent 949aa9f commit b2ecdd4

File tree

17 files changed

+256
-81
lines changed

17 files changed

+256
-81
lines changed

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "polars_bio"
3-
version = "0.7.4"
3+
version = "0.8.0"
44
edition = "2021"
55

66
[lib]

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,17 @@ It provides a DataFrame API for genomics data and is designed to be blazing fast
3131

3232
![count-overlaps-single.png](docs/assets/count-overlaps-single.png)
3333

34+
![coverage-single.png](docs/assets/coverage-single.png)
35+
3436
## Parallel performance 🏃‍🏃‍
3537
![overlap-parallel.png](docs/assets/overlap-parallel.png)
3638

3739
![overlap-parallel.png](docs/assets/nearest-parallel.png)
3840

3941
![count-overlaps-parallel.png](docs/assets/count-overlaps-parallel.png)
4042

43+
![coverage-parallel.png](docs/assets/coverage-parallel.png)
44+
4145

4246

4347
Read the [documentation](https://biodatageeks.github.io/polars-bio/)

docs/assets/coverage-parallel.png

64.6 KB
Loading

docs/assets/coverage-single.png

51.9 KB
Loading

docs/features.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
| cluster | :white_check_mark: | | :white_check_mark: | :white_check_mark: | | |
1111
| [merge](api.md#polars_bio.merge) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: |
1212
| complement | :white_check_mark: | :construction: | | :white_check_mark: | :white_check_mark: | |
13-
| coverage | :white_check_mark: | | :white_check_mark: | :white_check_mark: | | :white_check_mark: |
13+
| [coverage](api.md#polars_bio.coverage) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: |
1414
| [expand](api.md#polars_bio.LazyFrame.expand) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: |
1515
| [sort](api.md#polars_bio.LazyFrame.sort_bedframe) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: |
1616
| [read_table](api.md#polars_bio.read_table) | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | :white_check_mark: |

docs/notebooks/cookbook.ipynb

Lines changed: 14 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -21,24 +21,16 @@
2121
"id": "62a7b57c30bf54e2",
2222
"metadata": {
2323
"ExecuteTime": {
24-
"end_time": "2025-03-05T16:41:54.268168Z",
25-
"start_time": "2025-03-05T16:41:53.664194Z"
24+
"end_time": "2025-03-07T10:02:23.527490Z",
25+
"start_time": "2025-03-07T10:02:23.525921Z"
2626
}
2727
},
2828
"source": [
2929
"import polars_bio as pb\n",
3030
"import polars as pl"
3131
],
32-
"outputs": [
33-
{
34-
"name": "stderr",
35-
"output_type": "stream",
36-
"text": [
37-
"INFO:polars_bio:Creating BioSessionContext\n"
38-
]
39-
}
40-
],
41-
"execution_count": 2
32+
"outputs": [],
33+
"execution_count": 8
4234
},
4335
{
4436
"cell_type": "code",
@@ -646,34 +638,34 @@
646638
{
647639
"metadata": {
648640
"ExecuteTime": {
649-
"end_time": "2025-02-28T11:52:12.169029Z",
650-
"start_time": "2025-02-28T11:52:12.167384Z"
641+
"end_time": "2025-03-07T10:02:16.509828Z",
642+
"start_time": "2025-03-07T10:02:16.507744Z"
651643
}
652644
},
653645
"cell_type": "code",
654646
"source": "gcs_vcf_path = \"gs://genomics-public-data/platinum-genomes/vcf/NA12878_S1.genome.vcf\"",
655647
"id": "31f0f3d0974245bd",
656648
"outputs": [],
657-
"execution_count": 16
649+
"execution_count": 5
658650
},
659651
{
660652
"metadata": {
661653
"ExecuteTime": {
662-
"end_time": "2025-02-28T11:52:13.441345Z",
663-
"start_time": "2025-02-28T11:52:13.439461Z"
654+
"end_time": "2025-03-07T10:02:19.374881Z",
655+
"start_time": "2025-03-07T10:02:19.372753Z"
664656
}
665657
},
666658
"cell_type": "code",
667659
"source": "info_fields=[\"AC\", \"AF\"]",
668660
"id": "816c419b3b45ee44",
669661
"outputs": [],
670-
"execution_count": 17
662+
"execution_count": 6
671663
},
672664
{
673665
"metadata": {
674666
"ExecuteTime": {
675-
"end_time": "2025-02-28T11:52:17.666747Z",
676-
"start_time": "2025-02-28T11:52:16.365292Z"
667+
"end_time": "2025-03-07T10:02:52.904221Z",
668+
"start_time": "2025-03-07T10:02:41.364349Z"
677669
}
678670
},
679671
"cell_type": "code",
@@ -713,12 +705,12 @@
713705
"<small>shape: (3, 10)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>chrom</th><th>start</th><th>end</th><th>id</th><th>ref</th><th>alt</th><th>qual</th><th>filter</th><th>ac</th><th>af</th></tr><tr><td>str</td><td>u32</td><td>u32</td><td>str</td><td>str</td><td>str</td><td>f64</td><td>str</td><td>list[i32]</td><td>list[f32]</td></tr></thead><tbody><tr><td>&quot;chrM&quot;</td><td>1</td><td>1</td><td>&quot;&quot;</td><td>&quot;G&quot;</td><td>&quot;&quot;</td><td>0.0</td><td>&quot;PASS&quot;</td><td>null</td><td>null</td></tr><tr><td>&quot;chrM&quot;</td><td>2</td><td>72</td><td>&quot;&quot;</td><td>&quot;A&quot;</td><td>&quot;&quot;</td><td>0.0</td><td>&quot;PASS&quot;</td><td>null</td><td>null</td></tr><tr><td>&quot;chrM&quot;</td><td>73</td><td>73</td><td>&quot;&quot;</td><td>&quot;G&quot;</td><td>&quot;A&quot;</td><td>8752.780273</td><td>&quot;TruthSensitivityTranche99.90to…</td><td>[2]</td><td>[1.0]</td></tr></tbody></table></div>"
714706
]
715707
},
716-
"execution_count": 18,
708+
"execution_count": 9,
717709
"metadata": {},
718710
"output_type": "execute_result"
719711
}
720712
],
721-
"execution_count": 18
713+
"execution_count": 9
722714
},
723715
{
724716
"metadata": {},

docs/performance.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,16 @@
77

88
![count-overlaps-single.png](assets/count-overlaps-single.png)
99

10+
![coverage-single.png](assets/coverage-single.png)
11+
1012
## Parallel performance 🏃‍🏃‍
1113
![overlap-parallel.png](assets/overlap-parallel.png)
1214

1315
![overlap-parallel.png](assets/nearest-parallel.png)
1416

1517
![count-overlaps-parallel.png](assets/count-overlaps-parallel.png)
18+
19+
![coverage-parallel.png](assets/coverage-parallel.png)
1620
## Benchmarks 🧪
1721
### Detailed results shortcuts 👨‍🔬
1822
- [Binary operations](#binary-operations)

polars_bio/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
sql,
1515
)
1616
from .polars_ext import PolarsRangesOperations as LazyFrame
17-
from .range_op import FilterOp, count_overlaps, merge, nearest, overlap
17+
from .range_op import FilterOp, count_overlaps, coverage, merge, nearest, overlap
1818
from .range_viz import visualize_intervals
1919

2020
POLARS_BIO_MAX_THREADS = "datafusion.execution.target_partitions"
@@ -26,6 +26,7 @@
2626
"nearest",
2727
"merge",
2828
"count_overlaps",
29+
"coverage",
2930
"ctx",
3031
"FilterOp",
3132
"visualize_intervals",

polars_bio/polars_ext.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,3 +222,18 @@ def expand(
222222
if midsk in schema:
223223
df = df.drop(midsk)
224224
return df
225+
226+
def coverage(
227+
self,
228+
other_df: pl.LazyFrame,
229+
cols1=["chrom", "start", "end"],
230+
cols2=["chrom", "start", "end"],
231+
suffixes: tuple[str, str] = ("_1", "_2"),
232+
) -> pl.LazyFrame:
233+
"""
234+
!!! note
235+
Alias for [coverage](api.md#polars_bio.coverage)
236+
"""
237+
return pb.coverage(
238+
self._ldf, other_df, cols1=cols1, cols2=cols2, suffixes=suffixes
239+
)

0 commit comments

Comments
 (0)