|
1 | 1 | # EOPF GeoZarr Data Pipeline |
2 | 2 |
|
3 | | -Automated pipeline for converting Sentinel-2 Zarr datasets to cloud-optimized GeoZarr format with STAC catalog integration and interactive visualization. |
| 3 | +Automated pipeline for converting Sentinel Zarr datasets to cloud-optimized GeoZarr format with STAC catalog integration and interactive visualization. |
4 | 4 |
|
5 | | -## Quick Reference |
| 5 | +## Quick Start (30 seconds) |
6 | 6 |
|
7 | 7 | ```bash |
8 | | -# 1. Submit a workflow (simplest method) |
9 | | -uv run python examples/submit.py --stac-url "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2B_..." |
| 8 | +# 1. Submit workflow |
| 9 | +export KUBECONFIG=.work/kubeconfig |
| 10 | +kubectl create -f workflows/run-s1-test.yaml -n devseed-staging |
10 | 11 |
|
11 | | -# 2. Monitor progress |
12 | | -kubectl get wf -n devseed -w |
13 | | - |
14 | | -# 3. View result |
15 | | -# Check logs for viewer URL: https://api.explorer.eopf.copernicus.eu/raster/viewer?url=... |
| 12 | +# 2. Monitor |
| 13 | +kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<name> -c main -f |
16 | 14 | ``` |
17 | 15 |
|
18 | | -💡 **Local testing:** Port-forward RabbitMQ first: `kubectl port-forward -n core svc/rabbitmq 5672:5672 &` |
19 | | - |
20 | | -## Features |
21 | | - |
22 | | -[](LICENSE) |
23 | | -[](https://www.python.org/downloads/) |
24 | | -[](https://github.com/EOPF-Explorer/data-pipeline/actions) |
25 | | - |
26 | | -- **Multi-sensor support**: Sentinel-1 GRD and Sentinel-2 L2A |
27 | | -- STAC item registration with retry logic |
28 | | -- GeoZarr format conversion with cloud-optimized overviews |
29 | | -- Cloud-native workflows with Argo |
30 | | -- Interactive visualization with TiTiler |
| 16 | +📖 **New here?** [GETTING_STARTED.md](GETTING_STARTED.md) • **Details:** [Full docs below](#submitting-workflows) |
31 | 17 |
|
32 | 18 | ## What It Does |
33 | 19 |
|
34 | | -Transforms Sentinel satellite data into web-ready visualizations: |
| 20 | +**Input:** STAC item URL → **Output:** Interactive web map in ~15-20 minutes |
35 | 21 |
|
36 | | -**Input:** STAC item URL → **Output:** Interactive web map (~5-10 min) |
37 | | - |
38 | | -**Pipeline:** Convert (5 min) → Register (30 sec) → Augment (10 sec) |
| 22 | +``` |
| 23 | +Convert (15 min) → Register (30 sec) → Augment (10 sec) |
| 24 | +``` |
39 | 25 |
|
40 | | -**Supported sensors:** |
41 | | -- **Sentinel-1** L1 GRD: SAR backscatter (VH/VV polarizations) |
42 | | -- **Sentinel-2** L2A: Multispectral reflectance (10m/20m/60m) |
| 26 | +**Supports:** Sentinel-1 GRD (SAR) • Sentinel-2 L2A (optical) |
43 | 27 |
|
44 | | -## Quick Start |
| 28 | +**Prerequisites:** Kubernetes with [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) • Python 3.11+ • [GETTING_STARTED.md](GETTING_STARTED.md) for full setup |
45 | 29 |
|
46 | | -📖 **New to the project?** See [GETTING_STARTED.md](GETTING_STARTED.md) for complete setup (15 min). |
| 30 | +## Submitting Workflows |
47 | 31 |
|
48 | | -### Requirements |
| 32 | +| Method | Best For | Setup | Status | |
| 33 | +|--------|----------|-------|--------| |
| 34 | +| 🎯 **kubectl** | Testing, CI/CD | None | ✅ Recommended | |
| 35 | +| 📓 **Jupyter** | Learning, exploration | 2 min | ✅ Working | |
| 36 | +| ⚡ **Event-driven** | Production (auto) | In-cluster | ✅ Running | |
| 37 | +| 🐍 **Python CLI** | Scripting | Port-forward | ⚠️ Advanced | |
49 | 38 |
|
50 | | -- **Kubernetes cluster** with [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) infrastructure |
51 | | - - Argo Workflows (pipeline orchestration) |
52 | | - - RabbitMQ (event-driven automation) |
53 | | - - STAC API & TiTiler (catalog & visualization) |
54 | | -- **Python 3.11+** with `uv` package manager |
55 | | -- **S3 storage** credentials (outputs) |
56 | | -- **Kubeconfig** in `.work/kubeconfig` |
| 39 | +<details> |
| 40 | +<summary><b>kubectl</b> (recommended)</summary> |
57 | 41 |
|
58 | | -Verify: |
59 | 42 | ```bash |
60 | | -export KUBECONFIG=$(pwd)/.work/kubeconfig |
61 | | -kubectl get pods -n core -l app.kubernetes.io/name=argo-workflows |
62 | | -kubectl get pods -n core -l app.kubernetes.io/name=rabbitmq |
| 43 | +export KUBECONFIG=.work/kubeconfig |
| 44 | +kubectl create -f workflows/run-s1-test.yaml -n devseed-staging -o name |
| 45 | +kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<wf-name> -c main -f |
63 | 46 | ``` |
| 47 | +Edit `workflows/run-s1-test.yaml` with your STAC URL and collection. |
| 48 | +</details> |
64 | 49 |
|
65 | | -### Run Your First Job |
| 50 | +<details> |
| 51 | +<summary><b>Jupyter</b></summary> |
66 | 52 |
|
67 | 53 | ```bash |
68 | | -# 1. Install dependencies |
69 | | -uv sync --all-extras |
70 | | - |
71 | | -# 2. Deploy workflows |
72 | | -kubectl apply -f workflows/ -n devseed |
73 | | - |
74 | | -# 3. Port-forward RabbitMQ |
75 | | -kubectl port-forward -n core svc/rabbitmq 5672:5672 & |
76 | | - |
77 | | -# 4. Submit a STAC item |
78 | | -export AMQP_PASSWORD=$(kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d) |
79 | | -export AMQP_URL="amqp://user:${AMQP_PASSWORD}@localhost:5672/" |
80 | | - |
81 | | -uv run python examples/submit.py \ |
82 | | - --stac-url "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2B_MSIL2A_20250518_T29RLL" |
83 | | - |
84 | | -# 5. Monitor |
85 | | -kubectl get wf -n devseed -w |
| 54 | +uv sync --extra notebooks |
| 55 | +cp notebooks/.env.example notebooks/.env |
| 56 | +uv run jupyter lab notebooks/operator.ipynb |
86 | 57 | ``` |
| 58 | +</details> |
87 | 59 |
|
88 | | -**Result:** Interactive map at `https://api.explorer.eopf.copernicus.eu/raster/viewer?url=...` |
89 | | - |
90 | | -## How It Works |
91 | | - |
92 | | -### Pipeline Stages |
| 60 | +<details> |
| 61 | +<summary><b>Event-driven</b> (production)</summary> |
93 | 62 |
|
94 | | -| Stage | Time | Function | |
95 | | -|-------|------|----------| |
96 | | -| **Convert** | 5 min | Zarr → GeoZarr with spatial indexing & cloud optimization | |
97 | | -| **Register** | 30 sec | Create/update STAC item with metadata & assets | |
98 | | -| **Augment** | 10 sec | Add visualization links (XYZ tiles, TileJSON, viewer) | |
99 | | - |
100 | | -### Event-Driven Architecture |
101 | | - |
102 | | -``` |
103 | | -STAC URL → submit.py → RabbitMQ → AMQP Sensor → Argo Workflow |
104 | | - ↓ |
105 | | - Convert → Register → Augment |
106 | | - ↓ |
107 | | - STAC API + Interactive Map |
| 63 | +Publish to RabbitMQ `geozarr` exchange: |
| 64 | +```json |
| 65 | +{"source_url": "https://stac.../items/S1A_...", "item_id": "S1A_IW_GRDH_...", "collection": "sentinel-1-l1-grd-dp-test"} |
108 | 66 | ``` |
| 67 | +</details> |
109 | 68 |
|
110 | | -**Automation:** New Sentinel-2 data publishes to RabbitMQ → Pipeline runs automatically |
111 | | - |
112 | | -## Submitting Workflows |
113 | | - |
114 | | -**Choose your approach:** |
115 | | - |
116 | | -| Method | Best For | Documentation | |
117 | | -|--------|----------|---------------| |
118 | | -| 🎯 **CLI tool** | Quick testing, automation | [examples/README.md](examples/README.md) | |
119 | | -| 📓 **Jupyter notebook** | Learning, exploration | [notebooks/README.md](notebooks/README.md) | |
120 | | -| ⚡ **Event-driven** | Production (auto) | Already running! | |
121 | | -| 🔧 **Custom pika** | Custom integrations | [See Configuration](#configuration) | |
| 69 | +<details> |
| 70 | +<summary><b>Python CLI</b></summary> |
122 | 71 |
|
123 | | -**Quick example:** |
124 | 72 | ```bash |
125 | | -uv run python examples/submit.py --stac-url "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2B_..." |
126 | | -``` |
127 | | - |
128 | | -**Monitor:** |
129 | | -```bash |
130 | | -kubectl get wf -n devseed -w # Watch workflows |
131 | | -kubectl logs -n devseed -l sensor-name=geozarr-sensor --tail=50 # Sensor logs |
| 73 | +kubectl port-forward -n core svc/rabbitmq 5672:5672 |
| 74 | +export AMQP_PASSWORD=$(kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d) |
| 75 | +uv run python examples/submit.py --stac-url "..." --collection sentinel-2-l2a |
132 | 76 | ``` |
| 77 | +</details> |
133 | 78 |
|
134 | | -### Related Projects |
135 | | - |
136 | | -- **[data-model](https://github.com/EOPF-Explorer/data-model)** - `eopf-geozarr` conversion library (Python) |
137 | | -- **[platform-deploy](https://github.com/EOPF-Explorer/platform-deploy)** - K8s infrastructure (Flux, Argo, RabbitMQ, STAC, TiTiler) |
| 79 | +**Related:** [data-model](https://github.com/EOPF-Explorer/data-model) • [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) • [Testing report](docs/WORKFLOW_SUBMISSION_TESTING.md) |
138 | 80 |
|
139 | 81 | ## Configuration |
140 | 82 |
|
141 | | -### S3 Storage |
| 83 | +<details> |
| 84 | +<summary><b>S3 & RabbitMQ</b></summary> |
142 | 85 |
|
143 | 86 | ```bash |
| 87 | +# S3 credentials |
144 | 88 | kubectl create secret generic geozarr-s3-credentials -n devseed \ |
145 | | - --from-literal=AWS_ACCESS_KEY_ID="<your-key>" \ |
146 | | - --from-literal=AWS_SECRET_ACCESS_KEY="<your-secret>" |
147 | | -``` |
148 | | - |
149 | | -| Setting | Value | |
150 | | -|---------|-------| |
151 | | -| **Endpoint** | `https://s3.de.io.cloud.ovh.net` | |
152 | | -| **Bucket** | `esa-zarr-sentinel-explorer-fra` | |
153 | | -| **Region** | `de` | |
| 89 | + --from-literal=AWS_ACCESS_KEY_ID="<key>" \ |
| 90 | + --from-literal=AWS_SECRET_ACCESS_KEY="<secret>" |
154 | 91 |
|
155 | | -### RabbitMQ |
156 | | - |
157 | | -Get password: |
158 | | -```bash |
| 92 | +# RabbitMQ password |
159 | 93 | kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d |
160 | 94 | ``` |
161 | 95 |
|
162 | | -| Setting | Value | |
163 | | -|---------|-------| |
164 | | -| **URL** | `amqp://user:PASSWORD@rabbitmq.core.svc.cluster.local:5672/` | |
165 | | -| **Exchange** | `geozarr` | |
166 | | -| **Routing key** | `eopf.items.*` | |
167 | | - |
168 | | -**Message format:** |
169 | | -```json |
170 | | -{ |
171 | | - "source_url": "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/...", |
172 | | - "item_id": "S2B_MSIL2A_...", |
173 | | - "collection": "sentinel-2-l2a" |
174 | | -} |
175 | | -``` |
176 | | - |
177 | | -## Web Interfaces |
178 | | - |
179 | | -Access via [**EOxHub workspace**](https://workspace.devseed.hub-eopf-explorer.eox.at/) (single sign-on for all services): |
180 | | - |
181 | | -| Service | Purpose | URL | |
182 | | -|---------|---------|-----| |
183 | | -| **Argo Workflows** | Monitor pipelines | [argo-workflows.hub-eopf-explorer.eox.at](https://argo-workflows.hub-eopf-explorer.eox.at) | |
184 | | -| **STAC Browser** | Browse catalog | [api.explorer.eopf.copernicus.eu/stac](https://api.explorer.eopf.copernicus.eu/stac) | |
185 | | -| **TiTiler Viewer** | View maps | [api.explorer.eopf.copernicus.eu/raster](https://api.explorer.eopf.copernicus.eu/raster) | |
186 | | -| **JupyterLab** | Operator tools | Via EOxHub workspace | |
| 96 | +**Endpoints:** S3: `s3.de.io.cloud.ovh.net/esa-zarr-sentinel-explorer-fra` • RabbitMQ: `geozarr` exchange • [UIs](https://workspace.devseed.hub-eopf-explorer.eox.at/): [Argo](https://argo-workflows.hub-eopf-explorer.eox.at) • [STAC](https://api.explorer.eopf.copernicus.eu/stac) • [Viewer](https://api.explorer.eopf.copernicus.eu/raster) |
| 97 | +</details> |
187 | 98 |
|
188 | | -💡 **Tip:** Login to EOxHub first for seamless authentication across all services. |
189 | | - |
190 | | -## Monitoring & Troubleshooting |
191 | | - |
192 | | -### Workflow Status |
193 | | - |
194 | | -```bash |
195 | | -# List all workflows |
196 | | -kubectl get wf -n devseed |
197 | | - |
198 | | -# Watch real-time updates |
199 | | -kubectl get wf -n devseed -w |
200 | | - |
201 | | -# Detailed status |
202 | | -kubectl describe wf <workflow-name> -n devseed |
203 | | -``` |
| 99 | +## Troubleshooting |
204 | 100 |
|
205 | | -### Logs |
| 101 | +<details> |
| 102 | +<summary><b>Logs & Issues</b></summary> |
206 | 103 |
|
207 | 104 | ```bash |
208 | | -# Workflow pod logs |
209 | | -kubectl logs <pod-name> -n devseed |
210 | | - |
211 | | -# Sensor (message processing) |
| 105 | +kubectl get wf -n devseed-staging -w |
| 106 | +kubectl logs -n devseed-staging <pod-name> -c main -f |
212 | 107 | kubectl logs -n devseed -l sensor-name=geozarr-sensor --tail=50 |
213 | | - |
214 | | -# EventSource (RabbitMQ connection) |
215 | | -kubectl logs -n devseed -l eventsource-name=rabbitmq-geozarr --tail=50 |
216 | 108 | ``` |
217 | 109 |
|
218 | | -### Common Issues |
219 | | - |
220 | | -| Problem | Solution | |
221 | | -|---------|----------| |
222 | | -| **Workflow not starting** | Check sensor/eventsource logs for connection errors | |
223 | | -| **S3 access denied** | Verify secret `geozarr-s3-credentials` exists in `devseed` namespace | |
224 | | -| **RabbitMQ connection refused** | Port-forward required: `kubectl port-forward -n core svc/rabbitmq 5672:5672` | |
225 | | -| **Pod stuck in Pending** | Check node resources and pod limits | |
| 110 | +**Common fixes:** Workflow not starting → check sensor logs • S3 denied → verify `geozarr-s3-credentials` secret • RabbitMQ refused → `kubectl port-forward -n core svc/rabbitmq 5672:5672` • Pod pending → check resources |
| 111 | +</details> |
226 | 112 |
|
227 | 113 | ## Development |
228 | 114 |
|
229 | | -### Setup |
230 | | - |
231 | | -```bash |
232 | | -uv sync --all-extras |
233 | | -pre-commit install # Optional: enable git hooks |
234 | | -``` |
235 | | - |
236 | | -### Testing |
237 | | - |
238 | 115 | ```bash |
239 | | -make test # Run full test suite |
240 | | -make check # Lint + typecheck + test |
241 | | -pytest tests/ # Run specific tests |
242 | | -pytest -v -k e2e # End-to-end tests only |
| 116 | +uv sync --all-extras && pre-commit install |
| 117 | +make test # or: pytest tests/ -v -k e2e |
243 | 118 | ``` |
244 | 119 |
|
245 | | -### Project Structure |
246 | | - |
247 | | -``` |
248 | | -├── docker/ # Container images |
249 | | -│ ├── Dockerfile # Pipeline runtime |
250 | | -│ └── Dockerfile.test # Test environment |
251 | | -├── scripts/ # Python pipeline scripts |
252 | | -│ ├── register_stac.py # STAC catalog registration |
253 | | -│ ├── augment_stac_item.py # Add visualization links |
254 | | -│ └── get_zarr_url.py # Extract Zarr URL from STAC |
255 | | -├── workflows/ # Argo workflow definitions |
256 | | -│ ├── template.yaml # Main pipeline WorkflowTemplate |
257 | | -│ ├── eventsource.yaml # RabbitMQ AMQP event source |
258 | | -│ ├── sensor.yaml # Workflow trigger on messages |
259 | | -│ └── rbac.yaml # Service account permissions |
260 | | -├── examples/ # Usage examples |
261 | | -│ └── submit.py # Submit job via RabbitMQ |
262 | | -├── tests/ # Unit & integration tests |
263 | | -└── notebooks/ # Operator utilities |
264 | | -``` |
265 | | - |
266 | | -### Making Changes |
267 | | - |
268 | | -1. **Edit workflow:** `workflows/template.yaml` |
269 | | -2. **Update scripts:** `scripts/*.py` |
270 | | -3. **Test locally:** `pytest tests/ -v` |
271 | | -4. **Build image:** `docker buildx build --platform linux/amd64 -t ghcr.io/eopf-explorer/data-pipeline:dev -f docker/Dockerfile . --push` |
272 | | -5. **Deploy:** `kubectl apply -f workflows/template.yaml -n devseed` |
273 | | -6. **Monitor:** `kubectl get wf -n devseed -w` |
274 | | - |
275 | | -⚠️ **Important:** Always use `--platform linux/amd64` when building images for Kubernetes clusters. |
276 | | - |
277 | | -See [CONTRIBUTING.md](CONTRIBUTING.md) for coding standards and development workflow. |
| 120 | +**Deploy:** Edit `workflows/template.yaml` or `scripts/*.py` → `pytest tests/ -v` → `docker buildx build --platform linux/amd64 -t ghcr.io/eopf-explorer/data-pipeline:dev .` → `kubectl apply -f workflows/template.yaml -n devseed` • [CONTRIBUTING.md](CONTRIBUTING.md) |
278 | 121 |
|
279 | 122 | ## License |
280 | 123 |
|
|
0 commit comments