Skip to content

Commit 6be0633

Browse files
committed
docs: update README.md with code
1 parent a50a8cf commit 6be0633

File tree

1 file changed

+68
-69
lines changed

1 file changed

+68
-69
lines changed

README.md

Lines changed: 68 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -1,123 +1,122 @@
11
# EOPF GeoZarr Data Pipeline
22

3-
Automated pipeline for converting Sentinel Zarr datasets to cloud-optimized GeoZarr format with STAC catalog integration and interactive visualization.
3+
Automated Kubernetes pipeline for converting Sentinel Zarr datasets to cloud-optimized GeoZarr format with STAC catalog integration.
44

5-
## Quick Start (30 seconds)
5+
## Quick Start
66

77
```bash
8-
# 1. Submit workflow
98
export KUBECONFIG=.work/kubeconfig
109
kubectl create -f workflows/run-s1-test.yaml -n devseed-staging
11-
12-
# 2. Monitor
13-
kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<name> -c main -f
10+
kubectl get wf -n devseed-staging -w
1411
```
1512

16-
📖 **New here?** [GETTING_STARTED.md](GETTING_STARTED.md)**Details:** [Full docs below](#submitting-workflows)
13+
📖 **First time?** See [GETTING_STARTED.md](GETTING_STARTED.md) for full setup
14+
🎯 **Monitor:** [Argo UI](https://argo-workflows.hub-eopf-explorer.eox.at)
1715

1816
## What It Does
1917

20-
**Input:** STAC item URL → **Output:** Interactive web map in ~15-20 minutes
21-
22-
```
23-
Convert (15 min) → Register (30 sec) → Augment (10 sec)
24-
```
25-
26-
**Supports:** Sentinel-1 GRD (SAR) • Sentinel-2 L2A (optical)
27-
28-
**Prerequisites:** Kubernetes with [platform-deploy](https://github.com/EOPF-Explorer/platform-deploy) • Python 3.11+ • [GETTING_STARTED.md](GETTING_STARTED.md) for full setup
29-
30-
## Submitting Workflows
18+
**Input:** STAC item URL → **Output:** Cloud-optimized GeoZarr + Interactive map (~15-20 min)
3119

32-
| Method | Best For | Setup | Status |
33-
|--------|----------|-------|--------|
34-
| 🎯 **kubectl** | Testing, CI/CD | None | ✅ Recommended |
35-
| 📓 **Jupyter** | Learning, exploration | 2 min | ✅ Working |
36-
|**Event-driven** | Production (auto) | In-cluster | ✅ Running |
37-
| 🐍 **Python CLI** | Scripting | Port-forward | ⚠️ Advanced |
20+
**Supports:** Sentinel-1 GRD, Sentinel-2 L2A
21+
**Stack:** Argo Workflows • [eopf-geozarr](https://github.com/EOPF-Explorer/data-model) • Dask • RabbitMQ • Prometheus
22+
**Resources:** 6Gi memory, burstable CPU per workflow
3823

39-
<details>
40-
<summary><b>kubectl</b> (recommended)</summary>
24+
## Monitoring
4125

4226
```bash
43-
export KUBECONFIG=.work/kubeconfig
44-
kubectl create -f workflows/run-s1-test.yaml -n devseed-staging -o name
45-
kubectl logs -n devseed-staging -l workflows.argoproj.io/workflow=<wf-name> -c main -f
27+
# Health check
28+
kubectl get wf -n devseed-staging --field-selector status.phase=Running
29+
30+
# Recent workflows (last hour)
31+
kubectl get wf -n devseed-staging --sort-by=.metadata.creationTimestamp | tail -10
4632
```
47-
Edit `workflows/run-s1-test.yaml` with your STAC URL and collection.
48-
</details>
4933

50-
<details>
51-
<summary><b>Jupyter</b></summary>
34+
**Web UI:** [Argo Workflows](https://argo-workflows.hub-eopf-explorer.eox.at)
5235

36+
## Usage
37+
38+
### kubectl (Testing)
5339
```bash
54-
uv sync --extra notebooks
55-
cp notebooks/.env.example notebooks/.env
56-
uv run jupyter lab notebooks/operator.ipynb
40+
kubectl create -f workflows/run-s1-test.yaml -n devseed-staging
5741
```
58-
</details>
5942

60-
<details>
61-
<summary><b>Event-driven</b> (production)</summary>
43+
**Namespaces:** `devseed-staging` (testing) • `devseed` (production)
6244

45+
### Event-driven (Production)
6346
Publish to RabbitMQ `geozarr` exchange:
6447
```json
65-
{"source_url": "https://stac.../items/S1A_...", "item_id": "S1A_IW_GRDH_...", "collection": "sentinel-1-l1-grd-dp-test"}
48+
{"source_url": "https://stac.../items/...", "item_id": "...", "collection": "..."}
6649
```
67-
</details>
68-
69-
<details>
70-
<summary><b>Python CLI</b></summary>
7150

51+
### Jupyter Notebooks
7252
```bash
73-
kubectl port-forward -n core svc/rabbitmq 5672:5672
74-
export AMQP_PASSWORD=$(kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d)
75-
uv run python examples/submit.py --stac-url "..." --collection sentinel-2-l2a
53+
uv sync --extra notebooks
54+
cp notebooks/.env.example notebooks/.env
55+
uv run jupyter lab notebooks/
7656
```
77-
</details>
7857

79-
**Related:** [data-model](https://github.com/EOPF-Explorer/data-model)[platform-deploy](https://github.com/EOPF-Explorer/platform-deploy)[Testing report](docs/WORKFLOW_SUBMISSION_TESTING.md)
58+
See [examples/](examples/) for more patterns.
8059

8160
## Configuration
8261

83-
<details>
84-
<summary><b>S3 & RabbitMQ</b></summary>
85-
8662
```bash
87-
# S3 credentials
63+
# S3 credentials (OVH S3)
8864
kubectl create secret generic geozarr-s3-credentials -n devseed \
89-
--from-literal=AWS_ACCESS_KEY_ID="<key>" \
90-
--from-literal=AWS_SECRET_ACCESS_KEY="<secret>"
65+
--from-literal=AWS_ACCESS_KEY_ID="..." \
66+
--from-literal=AWS_SECRET_ACCESS_KEY="..." \
67+
--from-literal=AWS_ENDPOINT_URL="https://s3.de.io.cloud.ovh.net"
68+
69+
# S3 output location
70+
# Bucket: esa-zarr-sentinel-explorer-fra
71+
# Prefix: tests-output (staging) or geozarr (production)
9172

92-
# RabbitMQ password
73+
# Get RabbitMQ password
9374
kubectl get secret rabbitmq-password -n core -o jsonpath='{.data.rabbitmq-password}' | base64 -d
94-
```
9575

96-
**Endpoints:** S3: `s3.de.io.cloud.ovh.net/esa-zarr-sentinel-explorer-fra` • RabbitMQ: `geozarr` exchange • [UIs](https://workspace.devseed.hub-eopf-explorer.eox.at/): [Argo](https://argo-workflows.hub-eopf-explorer.eox.at)[STAC](https://api.explorer.eopf.copernicus.eu/stac)[Viewer](https://api.explorer.eopf.copernicus.eu/raster)
97-
</details>
76+
# STAC API endpoints
77+
# STAC API: https://api.explorer.eopf.copernicus.eu/stac
78+
# Raster API: https://api.explorer.eopf.copernicus.eu/raster
79+
```
9880

9981
## Troubleshooting
10082

101-
<details>
102-
<summary><b>Logs & Issues</b></summary>
103-
10483
```bash
105-
kubectl get wf -n devseed-staging -w
84+
# Check workflow status
85+
kubectl get wf -n devseed-staging --sort-by=.metadata.creationTimestamp | tail -5
86+
87+
# View logs
10688
kubectl logs -n devseed-staging <pod-name> -c main -f
107-
kubectl logs -n devseed -l sensor-name=geozarr-sensor --tail=50
89+
90+
# Check resources
91+
kubectl top nodes
10892
```
10993

110-
**Common fixes:** Workflow not starting → check sensor logs • S3 denied → verify `geozarr-s3-credentials` secret • RabbitMQ refused → `kubectl port-forward -n core svc/rabbitmq 5672:5672` • Pod pending → check resources
111-
</details>
94+
**Common issues:**
95+
- **Workflow not starting:** Check sensor logs: `kubectl logs -n devseed -l sensor-name=geozarr-sensor`
96+
- **S3 errors:** Verify credentials secret exists
97+
- **Pod pending:** Check node capacity with `kubectl top nodes`
98+
99+
**Performance:** S1 GRD (10GB): 15-20 min • S2 L2A (5GB): 8-12 min • Increase if >20GB dataset
100+
101+
See [GETTING_STARTED.md](GETTING_STARTED.md#troubleshooting) for more.
112102

113103
## Development
114104

115105
```bash
116-
uv sync --all-extras && pre-commit install
117-
make test # or: pytest tests/ -v -k e2e
106+
# Setup
107+
uv sync --all-extras
108+
pre-commit install
109+
110+
# Test
111+
pytest tests/ -v # 100/100 passing
112+
113+
# Deploy
114+
kubectl apply -f workflows/template.yaml -n devseed
118115
```
119116

120-
**Deploy:** Edit `workflows/template.yaml` or `scripts/*.py``pytest tests/ -v``docker buildx build --platform linux/amd64 -t ghcr.io/eopf-explorer/data-pipeline:dev .``kubectl apply -f workflows/template.yaml -n devseed`[CONTRIBUTING.md](CONTRIBUTING.md)
117+
**Project structure:** `workflows/` (manifests) • `scripts/` (Python utils) • `tests/` (pytest) • `notebooks/` (tutorials)
118+
119+
**Documentation:** [CONTRIBUTING.md](CONTRIBUTING.md)[GETTING_STARTED.md](GETTING_STARTED.md)
121120

122121
## License
123122

0 commit comments

Comments
 (0)