Skip to content

Commit 0fc5dc0

Browse files
authored
chore(airflow): Extend providers/extras for 3.0.6 (#1336)
* extension to list for 3.0.6 * full list based on constraints file * revert 3.0.1, drop pyspark * extend readme comment * linting * linting * changelog * typo * corrected changelog * split extras into separate lists * updated list of exclusions
1 parent cb04f12 commit 0fc5dc0

File tree

4 files changed

+67
-6
lines changed

4 files changed

+67
-6
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@ All notable changes to this project will be documented in this file.
88

99
- superset: Add 6.0.0-rc2 ([#1337]).
1010

11+
### Changed
12+
13+
- airflow: Extend list of providers for 3.0.6 ([#1336])
14+
15+
[#1336]: https://github.com/stackabletech/docker-images/pull/1336
1116
[#1337]: https://github.com/stackabletech/docker-images/pull/1337
1217

1318
## [25.11.0] - 2025-11-07

airflow/Dockerfile

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,13 +51,24 @@ ARG UV_VERSION
5151
# Airflow "extras" packages are listed here: https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html
5252
# They evolve over time and thus belong to the version-specific arguments.
5353
# The mysql provider is currently excluded.
54-
# Requires implementation of https://github.com/apache/airflow/blob/2.2.5/scripts/docker/install_mysql.sh
55-
ARG AIRFLOW_EXTRAS
54+
# Requires implementation of https://github.com/apache/airflow/blob/main/scripts/docker/install_mysql.sh
55+
# The providers are split into separate lists to make it easier to manage
56+
# (and to compare to the online links). Default values are provided for
57+
# backwards compatability.
58+
ARG AIRFLOW_EXTRAS_CORE=""
59+
ARG AIRFLOW_EXTRAS_META=""
60+
ARG AIRFLOW_EXTRAS_PROVIDER_APACHE=""
61+
ARG AIRFLOW_EXTRAS_EXTERNAL_SERVICES=""
62+
ARG AIRFLOW_EXTRAS_LOCALLY_INSTALLED_SOFTWARE=""
63+
ARG AIRFLOW_EXTRAS_OTHER=""
5664

5765
RUN microdnf module enable -y nodejs:${NODEJS_VERSION} && \
5866
microdnf update && \
5967
microdnf install \
6068
cyrus-sasl-devel \
69+
# Needed for kerberos
70+
cyrus-sasl-gssapi \
71+
krb5-devel\
6172
# Needed by ./configure to build gevent, see snippet [1] at the end of file
6273
diffutils \
6374
# Needed to build gevent, see snippet [1] at the end of file
@@ -93,6 +104,13 @@ COPY --chown=${STACKABLE_USER_UID}:0 airflow/stackable/patches/${PRODUCT_VERSION
93104
WORKDIR /stackable
94105

95106
RUN <<EOF
107+
108+
# Compose comma-delimited AIRFLOW_EXTRAS
109+
AIRFLOW_EXTRAS="$AIRFLOW_EXTRAS_CORE,$AIRFLOW_EXTRAS_META,$AIRFLOW_EXTRAS_PROVIDER_APACHE,$AIRFLOW_EXTRAS_EXTERNAL_SERVICES,$AIRFLOW_EXTRAS_LOCALLY_INSTALLED_SOFTWARE,$AIRFLOW_EXTRAS_OTHER"
110+
111+
# Removing duplicates
112+
AIRFLOW_EXTRAS=$(echo "$AIRFLOW_EXTRAS" | tr ',' '\n' | awk 'NF > 0 {if (!seen[$0]++) print $0}' | tr '\n' ',' | sed 's/,$//')
113+
96114
python${PYTHON_VERSION} -m venv --system-site-packages /stackable/app
97115

98116
source /stackable/app/bin/activate

airflow/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,22 @@ Example output:
1616
Downloading constraints file for Airflow 3.0.6 (Python 3.12)
1717
Successfully pulled new constraints file: constraints-3.0.6-python3.12.txt
1818
```
19+
20+
## Airflow providers/extras
21+
22+
The providers are released independently of Airflow.
23+
The list of provider packages are listed in the build configuration file, matching the groups used in the online documentation to make them easier to compare and manage (these will be concatentated into a single list in the Dockerfile).
24+
The expected versions are listed in the constraints files, but these can change over time.
25+
To keep the installation tightly coupled to the associated constraints it is best to only use providers listed in the relevant constraints file.
26+
27+
### Version 3.0.6
28+
29+
Applying the filter above results in the omission of the following providers:
30+
31+
- `apache-atlas`
32+
- `apache-webhdfs`
33+
34+
Other than the above, the only other providers that are currently excluded are:
35+
36+
- `mysql`, as it requires an implementation of: <https://github.com/apache/airflow/blob/main/scripts/docker/install_mysql.sh>
37+
- `apache-spark`, due to the size (roughly 500MB) and the number of high/critical CVEs it adds to the image

airflow/boil-config.toml

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ s3fs-version = "2024.9.0"
1010
cyclonedx-bom-version = "6.0.0"
1111
tini-version = "0.19.0"
1212
uv-version = "0.7.8"
13-
airflow-extras = "async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,google_auth,microsoft.azure,odbc,pandas,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv,trino"
13+
airflow-extras-other = "async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,google_auth,microsoft.azure,odbc,pandas,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv,trino"
1414
opa-auth-manager = "airflow-2"
1515
nodejs-version = "20"
1616

@@ -26,7 +26,7 @@ s3fs-version = "2024.9.0"
2626
cyclonedx-bom-version = "6.0.0"
2727
tini-version = "0.19.0"
2828
uv-version = "0.7.8"
29-
airflow-extras = "async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,google_auth,microsoft.azure,odbc,pandas,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv,trino"
29+
airflow-extras-other = "async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,ldap,google,google_auth,microsoft.azure,odbc,pandas,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv,trino"
3030
opa-auth-manager = "airflow-2"
3131
nodejs-version = "20"
3232

@@ -42,7 +42,7 @@ s3fs-version = "2024.9.0"
4242
cyclonedx-bom-version = "6.0.0"
4343
tini-version = "0.19.0"
4444
uv-version = "0.7.8"
45-
airflow-extras = "async,amazon,celery,cncf-kubernetes,docker,elasticsearch,fab,ftp,grpc,hashicorp,http,ldap,google,microsoft-azure,odbc,pandas,postgres,redis,sendgrid,sftp,slack,ssh,statsd,trino"
45+
airflow-extras-other = "async,amazon,celery,cncf-kubernetes,docker,elasticsearch,fab,ftp,grpc,hashicorp,http,ldap,google,microsoft-azure,odbc,pandas,postgres,redis,sendgrid,sftp,slack,ssh,statsd,trino"
4646
opa-auth-manager = "airflow-3"
4747
nodejs-version = "20"
4848

@@ -58,6 +58,25 @@ s3fs-version = "2024.9.0"
5858
cyclonedx-bom-version = "6.0.0"
5959
tini-version = "0.19.0"
6060
uv-version = "0.7.8"
61-
airflow-extras = "amazon,apache-kafka,async,celery,cncf-kubernetes,common-messaging,docker,elasticsearch,fab,ftp,grpc,hashicorp,http,ldap,google,microsoft-azure,odbc,pandas,postgres,redis,sendgrid,sftp,slack,ssh,statsd,trino"
61+
62+
# Airflow extras are defined in separate lists to make them easier to check against the links below. The lists will be concatenated and duplicates removed in the dockerfile.
63+
# See https://airflow.apache.org/docs/apache-airflow/3.0.6/extra-packages-ref.html#core-airflow-extras
64+
airflow-extras-core="async,graphviz,kerberos,otel,sentry,standard,statsd"
65+
66+
# See https://airflow.apache.org/docs/apache-airflow/3.0.6/extra-packages-ref.html#meta-airflow-package-extras
67+
airflow-extras-meta="aiobotocore,cloudpickle,github-enterprise,google-auth,graphviz,ldap,leveldb,pandas,polars,rabbitmq,s3fs,saml,uv"
68+
69+
# See https://airflow.apache.org/docs/apache-airflow/3.0.6/extra-packages-ref.html#apache-software-extras
70+
airflow-extras-provider-apache="apache-beam,apache-cassandra,apache-drill,apache-druid,apache-flink,apache-hdfs,apache-hive,apache-iceberg,apache-impala,apache-kafka,apache-kylin,apache-livy,apache-pig,apache-pinot"
71+
72+
# See https://airflow.apache.org/docs/apache-airflow/3.0.6/extra-packages-ref.html#external-services-extras
73+
airflow-extras-external-services="airbyte,alibaba,apprise,amazon,asana,atlassian-jira,microsoft-azure,cloudant,cohere,databricks,datadog,dbt-cloud,dingding,discord,facebook,github,google,hashicorp,openai,opsgenie,pagerduty,pgvector,pinecone,qdrant,salesforce,sendgrid,segment,slack,snowflake,tableau,tabular,telegram,vertica,weaviate,yandex,ydb,zendesk"
74+
75+
# See https://airflow.apache.org/docs/apache-airflow/3.0.6/extra-packages-ref.html#locally-installed-software-extras
76+
airflow-extras-locally-installed-software="arangodb,celery,cncf-kubernetes,docker,edge3,elasticsearch,exasol,fab,git,github,influxdb,jenkins,mongo,microsoft-mssql,neo4j,odbc,openfaas,oracle,postgres,presto,redis,samba,singularity,teradata,trino"
77+
78+
# See https://airflow.apache.org/docs/apache-airflow/3.0.6/extra-packages-ref.html#other-extras
79+
airflow-extras-other="common-compat,common-io,common-messaging,common-sql,ftp,grpc,http,imap,jdbc,microsoft-psrp,microsoft-winrm,openlineage,opensearch,papermill,sftp,smtp,sqlite,ssh"
80+
6281
opa-auth-manager = "airflow-3"
6382
nodejs-version = "20"

0 commit comments

Comments
 (0)