Commit 6cb242c
feat: Streaming VCF reader (#2)
* WIP: VCF
* Refactor
* Parsing info draft
* Working parser
* Refactoring infos
* async-trait downgrade
* Reverting to 0-based
* Renamed columns
* Add retry go operator
* Add IOTimeout
* Fixing streams
* Adding s3
* fix: Basic fields
* Adding support for remote reading of uncompressed VCFs
* Fix header
* Optimize variant_end
* Enabling projection
* Fixing local vcf without compression and gcs reads optimization
* Fixing local vcf reading with no compression
* Describe VCF
* fix: Tag case sensitive
* add performance/time measurement for batch processing vcf with noodles
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* add retry mechanism and adjust chunk size along with minimal concurrent fetches
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* Refactor scan to separate projected schema computation and use Field::nullable flag
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* propagate builders errors
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* Optimize OptionalField::new() to use with_capacity
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* Improve info_to_arrow_type logic
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* refactor format fields and cleanup code
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* complete bgzf compressed files format ingestion tests
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* add bgzf test in similar format to test_noodles.rs
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* add docker-compose for testing iceberg
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* add simple github workflows CI
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
* Cleanup a few warnings
* Bump runner image
---------
Signed-off-by: Piotr Dębski <ppdebski@interia.eu>
Co-authored-by: Piotr Dębski <ppdebski@interia.eu>1 parent b550a93 commit 6cb242c
File tree
17 files changed
+6283
-2
lines changed- .github/workflows
- datafusion/vcf
- examples
- src
17 files changed
+6283
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
0 commit comments