|
1 | 1 | # EOPF GeoZarr Data Pipeline |
2 | 2 |
|
3 | | -Automated pipeline for converting Sentinel Zarr datasets to cloud-optimized GeoZarr format with STAC catalog integration and interactive visualization. |
| 3 | +Automated Kubernetes pipeline for converting Sentinel Zarr datasets to cloud-optimized GeoZarr format with STAC catalog integration. |
4 | 4 |
|
5 | | -## Quick Start (30 seconds) |
| 5 | +## Quick Start |
6 | 6 |
|
7 | 7 | ```bash |
8 | | -# 1. Submit workflow |
9 | 8 | export KUBECONFIG=.work/kubeconfig |
10 | 9 | kubectl create -f workflows/run-s1-test.yaml -n devseed-staging |
11 | | - |
12 | | -# 2. Monitor |
13 | | -kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<name> -c main -f |
| 10 | +kubectl get wf -n devseed-staging -w |
14 | 11 | ``` |
15 | 12 |
|
16 | | -📖 **New here?** [GETTING_STARTED.md](GETTING_STARTED.md) • **Details:** [Full docs below](#submitting-workflows) |
| 13 | +📖 **First time?** See [GETTING_STARTED.md](GETTING_STARTED.md) for full setup |
| 14 | +🎯 **Monitor:** [Argo UI](https://argo-workflows.hub-eopf-explorer.eox.at) |
17 | 15 |
|
18 | 16 | ## What It Does |
19 | 17 |
|
20 | | -**Input:** STAC item URL → **Output:** Interactive web map in ~15-20 minutes |
21 | | - |
22 | | -``` |
23 | | -Convert (15 min) → Register (30 sec) → Augment (10 sec) |
24 | | -``` |
25 | | - |
26 | | -**Supports:** Sentinel-1 GRD (SAR) • Sentinel-2 L2A (optical) |
27 | | - |
28 | | -**Prerequisites:** Kubernetes with [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) • Python 3.11+ • [GETTING_STARTED.md](GETTING_STARTED.md) for full setup |
29 | | - |
30 | | -## Submitting Workflows |
| 18 | +**Input:** STAC item URL → **Output:** Cloud-optimized GeoZarr + Interactive map (~15-20 min) |
31 | 19 |
|
32 | | -| Method | Best For | Setup | Status | |
33 | | -|--------|----------|-------|--------| |
34 | | -| 🎯 **kubectl** | Testing, CI/CD | None | ✅ Recommended | |
35 | | -| 📓 **Jupyter** | Learning, exploration | 2 min | ✅ Working | |
36 | | -| ⚡ **Event-driven** | Production (auto) | In-cluster | ✅ Running | |
37 | | -| 🐍 **Python CLI** | Scripting | Port-forward | ⚠️ Advanced | |
| 20 | +**Supports:** Sentinel-1 GRD, Sentinel-2 L2A |
| 21 | +**Stack:** Argo Workflows • [eopf-geozarr](https://github.com/EOPF-Explorer/data-model) • Dask • RabbitMQ • Prometheus |
| 22 | +**Resources:** 6Gi memory, burstable CPU per workflow |
38 | 23 |
|
39 | | -<details> |
40 | | -<summary><b>kubectl</b> (recommended)</summary> |
| 24 | +## Monitoring |
41 | 25 |
|
42 | 26 | ```bash |
43 | | -export KUBECONFIG=.work/kubeconfig |
44 | | -kubectl create -f workflows/run-s1-test.yaml -n devseed-staging -o name |
45 | | -kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<wf-name> -c main -f |
| 27 | +# Health check |
| 28 | +kubectl get wf -n devseed-staging --field-selector status.phase=Running |
| 29 | + |
| 30 | +# Recent workflows (last hour) |
| 31 | +kubectl get wf -n devseed-staging --sort-by=.metadata.creationTimestamp | tail -10 |
46 | 32 | ``` |
47 | | -Edit `workflows/run-s1-test.yaml` with your STAC URL and collection. |
48 | | -</details> |
49 | 33 |
|
50 | | -<details> |
51 | | -<summary><b>Jupyter</b></summary> |
| 34 | +**Web UI:** [Argo Workflows](https://argo-workflows.hub-eopf-explorer.eox.at) |
52 | 35 |
|
| 36 | +## Usage |
| 37 | + |
| 38 | +### kubectl (Testing) |
53 | 39 | ```bash |
54 | | -uv sync --extra notebooks |
55 | | -cp notebooks/.env.example notebooks/.env |
56 | | -uv run jupyter lab notebooks/operator.ipynb |
| 40 | +kubectl create -f workflows/run-s1-test.yaml -n devseed-staging |
57 | 41 | ``` |
58 | | -</details> |
59 | 42 |
|
60 | | -<details> |
61 | | -<summary><b>Event-driven</b> (production)</summary> |
| 43 | +**Namespaces:** `devseed-staging` (testing) • `devseed` (production) |
62 | 44 |
|
| 45 | +### Event-driven (Production) |
63 | 46 | Publish to RabbitMQ `geozarr` exchange: |
64 | 47 | ```json |
65 | | -{"source_url": "https://stac.../items/S1A_...", "item_id": "S1A_IW_GRDH_...", "collection": "sentinel-1-l1-grd-dp-test"} |
| 48 | +{"source_url": "https://stac.../items/...", "item_id": "...", "collection": "..."} |
66 | 49 | ``` |
67 | | -</details> |
68 | | - |
69 | | -<details> |
70 | | -<summary><b>Python CLI</b></summary> |
71 | 50 |
|
| 51 | +### Jupyter Notebooks |
72 | 52 | ```bash |
73 | | -kubectl port-forward -n core svc/rabbitmq 5672:5672 |
74 | | -export AMQP_PASSWORD=$(kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d) |
75 | | -uv run python examples/submit.py --stac-url "..." --collection sentinel-2-l2a |
| 53 | +uv sync --extra notebooks |
| 54 | +cp notebooks/.env.example notebooks/.env |
| 55 | +uv run jupyter lab notebooks/ |
76 | 56 | ``` |
77 | | -</details> |
78 | 57 |
|
79 | | -**Related:** [data-model](https://github.com/EOPF-Explorer/data-model) • [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) • [Testing report](docs/WORKFLOW_SUBMISSION_TESTING.md) |
| 58 | +See [examples/](examples/) for more patterns. |
80 | 59 |
|
81 | 60 | ## Configuration |
82 | 61 |
|
83 | | -<details> |
84 | | -<summary><b>S3 & RabbitMQ</b></summary> |
85 | | - |
86 | 62 | ```bash |
87 | | -# S3 credentials |
| 63 | +# S3 credentials (OVH S3) |
88 | 64 | kubectl create secret generic geozarr-s3-credentials -n devseed \ |
89 | | - --from-literal=AWS_ACCESS_KEY_ID="<key>" \ |
90 | | - --from-literal=AWS_SECRET_ACCESS_KEY="<secret>" |
| 65 | + --from-literal=AWS_ACCESS_KEY_ID="..." \ |
| 66 | + --from-literal=AWS_SECRET_ACCESS_KEY="..." \ |
| 67 | + --from-literal=AWS_ENDPOINT_URL="https://s3.de.io.cloud.ovh.net" |
| 68 | + |
| 69 | +# S3 output location |
| 70 | +# Bucket: esa-zarr-sentinel-explorer-fra |
| 71 | +# Prefix: tests-output (staging) or geozarr (production) |
91 | 72 |
|
92 | | -# RabbitMQ password |
| 73 | +# Get RabbitMQ password |
93 | 74 | kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d |
94 | | -``` |
95 | 75 |
|
96 | | -**Endpoints:** S3: `s3.de.io.cloud.ovh.net/esa-zarr-sentinel-explorer-fra` • RabbitMQ: `geozarr` exchange • [UIs](https://workspace.devseed.hub-eopf-explorer.eox.at/): [Argo](https://argo-workflows.hub-eopf-explorer.eox.at) • [STAC](https://api.explorer.eopf.copernicus.eu/stac) • [Viewer](https://api.explorer.eopf.copernicus.eu/raster) |
97 | | -</details> |
| 76 | +# STAC API endpoints |
| 77 | +# STAC API: https://api.explorer.eopf.copernicus.eu/stac |
| 78 | +# Raster API: https://api.explorer.eopf.copernicus.eu/raster |
| 79 | +``` |
98 | 80 |
|
99 | 81 | ## Troubleshooting |
100 | 82 |
|
101 | | -<details> |
102 | | -<summary><b>Logs & Issues</b></summary> |
103 | | - |
104 | 83 | ```bash |
105 | | -kubectl get wf -n devseed-staging -w |
| 84 | +# Check workflow status |
| 85 | +kubectl get wf -n devseed-staging --sort-by=.metadata.creationTimestamp | tail -5 |
| 86 | + |
| 87 | +# View logs |
106 | 88 | kubectl logs -n devseed-staging <pod-name> -c main -f |
107 | | -kubectl logs -n devseed -l sensor-name=geozarr-sensor --tail=50 |
| 89 | + |
| 90 | +# Check resources |
| 91 | +kubectl top nodes |
108 | 92 | ``` |
109 | 93 |
|
110 | | -**Common fixes:** Workflow not starting → check sensor logs • S3 denied → verify `geozarr-s3-credentials` secret • RabbitMQ refused → `kubectl port-forward -n core svc/rabbitmq 5672:5672` • Pod pending → check resources |
111 | | -</details> |
| 94 | +**Common issues:** |
| 95 | +- **Workflow not starting:** Check sensor logs: `kubectl logs -n devseed -l sensor-name=geozarr-sensor` |
| 96 | +- **S3 errors:** Verify credentials secret exists |
| 97 | +- **Pod pending:** Check node capacity with `kubectl top nodes` |
| 98 | + |
| 99 | +**Performance:** S1 GRD (10GB): 15-20 min • S2 L2A (5GB): 8-12 min • Increase if >20GB dataset |
| 100 | + |
| 101 | +See [GETTING_STARTED.md](GETTING_STARTED.md#troubleshooting) for more. |
112 | 102 |
|
113 | 103 | ## Development |
114 | 104 |
|
115 | 105 | ```bash |
116 | | -uv sync --all-extras && pre-commit install |
117 | | -make test # or: pytest tests/ -v -k e2e |
| 106 | +# Setup |
| 107 | +uv sync --all-extras |
| 108 | +pre-commit install |
| 109 | + |
| 110 | +# Test |
| 111 | +pytest tests/ -v # 100/100 passing |
| 112 | + |
| 113 | +# Deploy |
| 114 | +kubectl apply -f workflows/template.yaml -n devseed |
118 | 115 | ``` |
119 | 116 |
|
120 | | -**Deploy:** Edit `workflows/template.yaml` or `scripts/*.py` → `pytest tests/ -v` → `docker buildx build --platform linux/amd64 -t ghcr.io/eopf-explorer/data-pipeline:dev .` → `kubectl apply -f workflows/template.yaml -n devseed` • [CONTRIBUTING.md](CONTRIBUTING.md) |
| 117 | +**Project structure:** `workflows/` (manifests) • `scripts/` (Python utils) • `tests/` (pytest) • `notebooks/` (tutorials) |
| 118 | + |
| 119 | +**Documentation:** [CONTRIBUTING.md](CONTRIBUTING.md) • [GETTING_STARTED.md](GETTING_STARTED.md) |
121 | 120 |
|
122 | 121 | ## License |
123 | 122 |
|
|
0 commit comments