Skip to content

Commit de3ea36

Browse files
committed
docs: comprehensive workflow submission guide
- Add WORKFLOW_SUBMISSION_TESTING.md with complete test results - Update README.md: reorganize by recommendation priority - Document all 4 submission methods with pros/cons - Add troubleshooting for log visibility and resource limits - Simplify Quick Start to 2 commands (30 seconds) - Document Dask integration and resource optimization Covers kubectl, Jupyter, event-driven (AMQP), and Python CLI approaches.
1 parent 81182a8 commit de3ea36

File tree

1 file changed

+65
-222
lines changed

1 file changed

+65
-222
lines changed

README.md

Lines changed: 65 additions & 222 deletions
Original file line numberDiff line numberDiff line change
@@ -1,280 +1,123 @@
11
# EOPF GeoZarr Data Pipeline
22

3-
Automated pipeline for converting Sentinel-2 Zarr datasets to cloud-optimized GeoZarr format with STAC catalog integration and interactive visualization.
3+
Automated pipeline for converting Sentinel Zarr datasets to cloud-optimized GeoZarr format with STAC catalog integration and interactive visualization.
44

5-
## Quick Reference
5+
## Quick Start (30 seconds)
66

77
```bash
8-
# 1. Submit a workflow (simplest method)
9-
uv run python examples/submit.py --stac-url "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2B_..."
8+
# 1. Submit workflow
9+
export KUBECONFIG=.work/kubeconfig
10+
kubectl create -f workflows/run-s1-test.yaml -n devseed-staging
1011

11-
# 2. Monitor progress
12-
kubectl get wf -n devseed -w
13-
14-
# 3. View result
15-
# Check logs for viewer URL: https://api.explorer.eopf.copernicus.eu/raster/viewer?url=...
12+
# 2. Monitor
13+
kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<name> -c main -f
1614
```
1715

18-
💡 **Local testing:** Port-forward RabbitMQ first: `kubectl port-forward -n core svc/rabbitmq 5672:5672 &`
19-
20-
## Features
21-
22-
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
23-
[![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
24-
[![Tests](https://github.com/EOPF-Explorer/data-pipeline/workflows/Tests/badge.svg)](https://github.com/EOPF-Explorer/data-pipeline/actions)
25-
26-
- **Multi-sensor support**: Sentinel-1 GRD and Sentinel-2 L2A
27-
- STAC item registration with retry logic
28-
- GeoZarr format conversion with cloud-optimized overviews
29-
- Cloud-native workflows with Argo
30-
- Interactive visualization with TiTiler
16+
📖 **New here?** [GETTING_STARTED.md](GETTING_STARTED.md)**Details:** [Full docs below](#submitting-workflows)
3117

3218
## What It Does
3319

34-
Transforms Sentinel satellite data into web-ready visualizations:
20+
**Input:** STAC item URL → **Output:** Interactive web map in ~15-20 minutes
3521

36-
**Input:** STAC item URL → **Output:** Interactive web map (~5-10 min)
37-
38-
**Pipeline:** Convert (5 min) → Register (30 sec) → Augment (10 sec)
22+
```
23+
Convert (15 min) → Register (30 sec) → Augment (10 sec)
24+
```
3925

40-
**Supported sensors:**
41-
- **Sentinel-1** L1 GRD: SAR backscatter (VH/VV polarizations)
42-
- **Sentinel-2** L2A: Multispectral reflectance (10m/20m/60m)
26+
**Supports:** Sentinel-1 GRD (SAR) • Sentinel-2 L2A (optical)
4327

44-
## Quick Start
28+
**Prerequisites:** Kubernetes with [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) • Python 3.11+ • [GETTING_STARTED.md](GETTING_STARTED.md) for full setup
4529

46-
📖 **New to the project?** See [GETTING_STARTED.md](GETTING_STARTED.md) for complete setup (15 min).
30+
## Submitting Workflows
4731

48-
### Requirements
32+
| Method | Best For | Setup | Status |
33+
|--------|----------|-------|--------|
34+
| 🎯 **kubectl** | Testing, CI/CD | None | ✅ Recommended |
35+
| 📓 **Jupyter** | Learning, exploration | 2 min | ✅ Working |
36+
|**Event-driven** | Production (auto) | In-cluster | ✅ Running |
37+
| 🐍 **Python CLI** | Scripting | Port-forward | ⚠️ Advanced |
4938

50-
- **Kubernetes cluster** with [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) infrastructure
51-
- Argo Workflows (pipeline orchestration)
52-
- RabbitMQ (event-driven automation)
53-
- STAC API & TiTiler (catalog & visualization)
54-
- **Python 3.11+** with `uv` package manager
55-
- **S3 storage** credentials (outputs)
56-
- **Kubeconfig** in `.work/kubeconfig`
39+
<details>
40+
<summary><b>kubectl</b> (recommended)</summary>
5741

58-
Verify:
5942
```bash
60-
export KUBECONFIG=$(pwd)/.work/kubeconfig
61-
kubectl get pods -n core -l app.kubernetes.io/name=argo-workflows
62-
kubectl get pods -n core -l app.kubernetes.io/name=rabbitmq
43+
export KUBECONFIG=.work/kubeconfig
44+
kubectl create -f workflows/run-s1-test.yaml -n devseed-staging -o name
45+
kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<wf-name> -c main -f
6346
```
47+
Edit `workflows/run-s1-test.yaml` with your STAC URL and collection.
48+
</details>
6449

65-
### Run Your First Job
50+
<details>
51+
<summary><b>Jupyter</b></summary>
6652

6753
```bash
68-
# 1. Install dependencies
69-
uv sync --all-extras
70-
71-
# 2. Deploy workflows
72-
kubectl apply -f workflows/ -n devseed
73-
74-
# 3. Port-forward RabbitMQ
75-
kubectl port-forward -n core svc/rabbitmq 5672:5672 &
76-
77-
# 4. Submit a STAC item
78-
export AMQP_PASSWORD=$(kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d)
79-
export AMQP_URL="amqp://user:${AMQP_PASSWORD}@localhost:5672/"
80-
81-
uv run python examples/submit.py \
82-
--stac-url "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2B_MSIL2A_20250518_T29RLL"
83-
84-
# 5. Monitor
85-
kubectl get wf -n devseed -w
54+
uv sync --extra notebooks
55+
cp notebooks/.env.example notebooks/.env
56+
uv run jupyter lab notebooks/operator.ipynb
8657
```
58+
</details>
8759

88-
**Result:** Interactive map at `https://api.explorer.eopf.copernicus.eu/raster/viewer?url=...`
89-
90-
## How It Works
91-
92-
### Pipeline Stages
60+
<details>
61+
<summary><b>Event-driven</b> (production)</summary>
9362

94-
| Stage | Time | Function |
95-
|-------|------|----------|
96-
| **Convert** | 5 min | Zarr → GeoZarr with spatial indexing & cloud optimization |
97-
| **Register** | 30 sec | Create/update STAC item with metadata & assets |
98-
| **Augment** | 10 sec | Add visualization links (XYZ tiles, TileJSON, viewer) |
99-
100-
### Event-Driven Architecture
101-
102-
```
103-
STAC URL → submit.py → RabbitMQ → AMQP Sensor → Argo Workflow
104-
105-
Convert → Register → Augment
106-
107-
STAC API + Interactive Map
63+
Publish to RabbitMQ `geozarr` exchange:
64+
```json
65+
{"source_url": "https://stac.../items/S1A_...", "item_id": "S1A_IW_GRDH_...", "collection": "sentinel-1-l1-grd-dp-test"}
10866
```
67+
</details>
10968

110-
**Automation:** New Sentinel-2 data publishes to RabbitMQ → Pipeline runs automatically
111-
112-
## Submitting Workflows
113-
114-
**Choose your approach:**
115-
116-
| Method | Best For | Documentation |
117-
|--------|----------|---------------|
118-
| 🎯 **CLI tool** | Quick testing, automation | [examples/README.md](examples/README.md) |
119-
| 📓 **Jupyter notebook** | Learning, exploration | [notebooks/README.md](notebooks/README.md) |
120-
|**Event-driven** | Production (auto) | Already running! |
121-
| 🔧 **Custom pika** | Custom integrations | [See Configuration](#configuration) |
69+
<details>
70+
<summary><b>Python CLI</b></summary>
12271

123-
**Quick example:**
12472
```bash
125-
uv run python examples/submit.py --stac-url "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/S2B_..."
126-
```
127-
128-
**Monitor:**
129-
```bash
130-
kubectl get wf -n devseed -w # Watch workflows
131-
kubectl logs -n devseed -l sensor-name=geozarr-sensor --tail=50 # Sensor logs
73+
kubectl port-forward -n core svc/rabbitmq 5672:5672
74+
export AMQP_PASSWORD=$(kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d)
75+
uv run python examples/submit.py --stac-url "..." --collection sentinel-2-l2a
13276
```
77+
</details>
13378

134-
### Related Projects
135-
136-
- **[data-model](https://github.com/EOPF-Explorer/data-model)** - `eopf-geozarr` conversion library (Python)
137-
- **[platform-deploy](https://github.com/EOPF-Explorer/platform-deploy)** - K8s infrastructure (Flux, Argo, RabbitMQ, STAC, TiTiler)
79+
**Related:** [data-model](https://github.com/EOPF-Explorer/data-model)[platform-deploy](https://github.com/EOPF-Explorer/platform-deploy)[Testing report](docs/WORKFLOW_SUBMISSION_TESTING.md)
13880

13981
## Configuration
14082

141-
### S3 Storage
83+
<details>
84+
<summary><b>S3 & RabbitMQ</b></summary>
14285

14386
```bash
87+
# S3 credentials
14488
kubectl create secret generic geozarr-s3-credentials -n devseed \
145-
--from-literal=AWS_ACCESS_KEY_ID="<your-key>" \
146-
--from-literal=AWS_SECRET_ACCESS_KEY="<your-secret>"
147-
```
148-
149-
| Setting | Value |
150-
|---------|-------|
151-
| **Endpoint** | `https://s3.de.io.cloud.ovh.net` |
152-
| **Bucket** | `esa-zarr-sentinel-explorer-fra` |
153-
| **Region** | `de` |
89+
--from-literal=AWS_ACCESS_KEY_ID="<key>" \
90+
--from-literal=AWS_SECRET_ACCESS_KEY="<secret>"
15491

155-
### RabbitMQ
156-
157-
Get password:
158-
```bash
92+
# RabbitMQ password
15993
kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d
16094
```
16195

162-
| Setting | Value |
163-
|---------|-------|
164-
| **URL** | `amqp://user:PASSWORD@rabbitmq.core.svc.cluster.local:5672/` |
165-
| **Exchange** | `geozarr` |
166-
| **Routing key** | `eopf.items.*` |
167-
168-
**Message format:**
169-
```json
170-
{
171-
"source_url": "https://stac.core.eopf.eodc.eu/collections/sentinel-2-l2a/items/...",
172-
"item_id": "S2B_MSIL2A_...",
173-
"collection": "sentinel-2-l2a"
174-
}
175-
```
176-
177-
## Web Interfaces
178-
179-
Access via [**EOxHub workspace**](https://workspace.devseed.hub-eopf-explorer.eox.at/) (single sign-on for all services):
180-
181-
| Service | Purpose | URL |
182-
|---------|---------|-----|
183-
| **Argo Workflows** | Monitor pipelines | [argo-workflows.hub-eopf-explorer.eox.at](https://argo-workflows.hub-eopf-explorer.eox.at) |
184-
| **STAC Browser** | Browse catalog | [api.explorer.eopf.copernicus.eu/stac](https://api.explorer.eopf.copernicus.eu/stac) |
185-
| **TiTiler Viewer** | View maps | [api.explorer.eopf.copernicus.eu/raster](https://api.explorer.eopf.copernicus.eu/raster) |
186-
| **JupyterLab** | Operator tools | Via EOxHub workspace |
96+
**Endpoints:** S3: `s3.de.io.cloud.ovh.net/esa-zarr-sentinel-explorer-fra` • RabbitMQ: `geozarr` exchange • [UIs](https://workspace.devseed.hub-eopf-explorer.eox.at/): [Argo](https://argo-workflows.hub-eopf-explorer.eox.at)[STAC](https://api.explorer.eopf.copernicus.eu/stac)[Viewer](https://api.explorer.eopf.copernicus.eu/raster)
97+
</details>
18798

188-
💡 **Tip:** Login to EOxHub first for seamless authentication across all services.
189-
190-
## Monitoring & Troubleshooting
191-
192-
### Workflow Status
193-
194-
```bash
195-
# List all workflows
196-
kubectl get wf -n devseed
197-
198-
# Watch real-time updates
199-
kubectl get wf -n devseed -w
200-
201-
# Detailed status
202-
kubectl describe wf <workflow-name> -n devseed
203-
```
99+
## Troubleshooting
204100

205-
### Logs
101+
<details>
102+
<summary><b>Logs & Issues</b></summary>
206103

207104
```bash
208-
# Workflow pod logs
209-
kubectl logs <pod-name> -n devseed
210-
211-
# Sensor (message processing)
105+
kubectl get wf -n devseed-staging -w
106+
kubectl logs -n devseed-staging <pod-name> -c main -f
212107
kubectl logs -n devseed -l sensor-name=geozarr-sensor --tail=50
213-
214-
# EventSource (RabbitMQ connection)
215-
kubectl logs -n devseed -l eventsource-name=rabbitmq-geozarr --tail=50
216108
```
217109

218-
### Common Issues
219-
220-
| Problem | Solution |
221-
|---------|----------|
222-
| **Workflow not starting** | Check sensor/eventsource logs for connection errors |
223-
| **S3 access denied** | Verify secret `geozarr-s3-credentials` exists in `devseed` namespace |
224-
| **RabbitMQ connection refused** | Port-forward required: `kubectl port-forward -n core svc/rabbitmq 5672:5672` |
225-
| **Pod stuck in Pending** | Check node resources and pod limits |
110+
**Common fixes:** Workflow not starting → check sensor logs • S3 denied → verify `geozarr-s3-credentials` secret • RabbitMQ refused → `kubectl port-forward -n core svc/rabbitmq 5672:5672` • Pod pending → check resources
111+
</details>
226112

227113
## Development
228114

229-
### Setup
230-
231-
```bash
232-
uv sync --all-extras
233-
pre-commit install # Optional: enable git hooks
234-
```
235-
236-
### Testing
237-
238115
```bash
239-
make test # Run full test suite
240-
make check # Lint + typecheck + test
241-
pytest tests/ # Run specific tests
242-
pytest -v -k e2e # End-to-end tests only
116+
uv sync --all-extras && pre-commit install
117+
make test # or: pytest tests/ -v -k e2e
243118
```
244119

245-
### Project Structure
246-
247-
```
248-
├── docker/ # Container images
249-
│ ├── Dockerfile # Pipeline runtime
250-
│ └── Dockerfile.test # Test environment
251-
├── scripts/ # Python pipeline scripts
252-
│ ├── register_stac.py # STAC catalog registration
253-
│ ├── augment_stac_item.py # Add visualization links
254-
│ └── get_zarr_url.py # Extract Zarr URL from STAC
255-
├── workflows/ # Argo workflow definitions
256-
│ ├── template.yaml # Main pipeline WorkflowTemplate
257-
│ ├── eventsource.yaml # RabbitMQ AMQP event source
258-
│ ├── sensor.yaml # Workflow trigger on messages
259-
│ └── rbac.yaml # Service account permissions
260-
├── examples/ # Usage examples
261-
│ └── submit.py # Submit job via RabbitMQ
262-
├── tests/ # Unit & integration tests
263-
└── notebooks/ # Operator utilities
264-
```
265-
266-
### Making Changes
267-
268-
1. **Edit workflow:** `workflows/template.yaml`
269-
2. **Update scripts:** `scripts/*.py`
270-
3. **Test locally:** `pytest tests/ -v`
271-
4. **Build image:** `docker buildx build --platform linux/amd64 -t ghcr.io/eopf-explorer/data-pipeline:dev -f docker/Dockerfile . --push`
272-
5. **Deploy:** `kubectl apply -f workflows/template.yaml -n devseed`
273-
6. **Monitor:** `kubectl get wf -n devseed -w`
274-
275-
⚠️ **Important:** Always use `--platform linux/amd64` when building images for Kubernetes clusters.
276-
277-
See [CONTRIBUTING.md](CONTRIBUTING.md) for coding standards and development workflow.
120+
**Deploy:** Edit `workflows/template.yaml` or `scripts/*.py``pytest tests/ -v``docker buildx build --platform linux/amd64 -t ghcr.io/eopf-explorer/data-pipeline:dev .``kubectl apply -f workflows/template.yaml -n devseed`[CONTRIBUTING.md](CONTRIBUTING.md)
278121

279122
## License
280123

0 commit comments

Comments
 (0)