Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 86 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,92 @@

Example applications in Java, Python, Scala and SQL for Amazon Managed Service for Apache Flink (formerly known as Amazon Kinesis Data Analytics), illustrating various aspects of Apache Flink applications, and simple "getting started" base projects.

* [Java examples](./java)
* [Python examples](./python)
* [Scala examples](/scala)
* [Operational utilities and infrastructure code](./infrastructure)
## Table of Contents

### Java Examples

#### Getting Started
- [**Getting Started - DataStream API**](./java/GettingStarted) - Skeleton project for a basic Flink Java application using DataStream API
- [**Getting Started - Table API & SQL**](./java/GettingStartedTable) - Basic Flink Java application using Table API & SQL with DataStream API

#### Connectors
- [**Kinesis Connectors**](./java/KinesisConnectors) - Examples of Flink Kinesis Connector source and sink (standard and EFO)
- [**Kinesis Source Deaggregation**](./java/KinesisSourceDeaggregation) - Handling Kinesis record deaggregation in the Kinesis source
- [**Kafka Connectors**](./java/KafkaConnectors) - Examples of Flink Kafka Connector source and sink
- [**Kafka Config Providers**](./java/KafkaConfigProviders) - Examples of using Kafka Config Providers for secure configuration management
- [**DynamoDB Stream Source**](./java/DynamoDBStreamSource) - Reading from DynamoDB Streams as a source
- [**Kinesis Firehose Sink**](./java/KinesisFirehoseSink) - Writing data to Amazon Kinesis Data Firehose
- [**SQS Sink**](./java/SQSSink) - Writing data to Amazon SQS
- [**Prometheus Sink**](./java/PrometheusSink) - Sending metrics to Prometheus
- [**Flink CDC**](./java/FlinkCDC) - Change Data Capture examples using Flink CDC

#### Reading and writing files and transactional data lake formats
- [**Iceberg**](./java/Iceberg) - Working with Apache Iceberg and Amazon S3 Tables
- [**S3 Sink**](./java/S3Sink) - Writing JSON data to Amazon S3
- [**S3 Avro Sink**](./java/S3AvroSink) - Writing Avro format data to Amazon S3
- [**S3 Avro Source**](./java/S3AvroSource) - Reading Avro format data from Amazon S3
- [**S3 Parquet Sink**](./java/S3ParquetSink) - Writing Parquet format data to Amazon S3
- [**S3 Parquet Source**](./java/S3ParquetSource) - Reading Parquet format data from Amazon S3

#### Data Formats & Schema Registry
- [**Avro with Glue Schema Registry - Kinesis**](./java/AvroGlueSchemaRegistryKinesis) - Using Avro format with AWS Glue Schema Registry and Kinesis
- [**Avro with Glue Schema Registry - Kafka**](./java/AvroGlueSchemaRegistryKafka) - Using Avro format with AWS Glue Schema Registry and Kafka

#### Stream Processing Patterns
- [**Serialization**](./java/Serialization) - Serialization of record and state
- [**Windowing**](./java/Windowing) - Time-based window aggregation examples
- [**Side Outputs**](./java/SideOutputs) - Using side outputs for data routing and filtering
- [**Async I/O**](./java/AsyncIO) - Asynchronous I/O patterns with retries for external API calls\
- [**Custom Metrics**](./java/CustomMetrics) - Creating and publishing custom application metrics



### Python Examples

#### Getting Started
- [**Getting Started**](./python/GettingStarted) - Basic PyFlink application Table API & SQL

#### Handling Python dependencies
- [**Python Dependencies**](./python/PythonDependencies) - Managing Python dependencies in PyFlink applications using `requirements.txt`
- [**Packaged Python Dependencies**](./python/PackagedPythonDependencies) - Managing Python dependencies packaged with the PyFlink application at build time

#### Connectors
- [**Datastream Kafka Connector**](./python/DatastreamKafkaConnector) - Using Kafka connector with PyFlink DataStream API
- [**Kafka Config Providers**](./python/KafkaConfigProviders) - Secure configuration management for Kafka in PyFlink
- [**S3 Sink**](./python/S3Sink) - Writing data to Amazon S3 using PyFlink
- [**Firehose Sink**](./python/FirehoseSink) - Writing data to Amazon Kinesis Data Firehose
- [**Iceberg Sink**](./python/IcebergSink) - Writing data to Apache Iceberg tables

#### Stream Processing Patterns
- [**Windowing**](./python/Windowing) - Time-based window aggregation examples with PyFlink/SQL
- [**User Defined Functions (UDF)**](./python/UDF) - Creating and using custom functions in PyFlink

#### Utilities
- [**Data Generator**](./python/data-generator) - Python script for generating sample data to Kinesis Data Streams
- [**Local Development on Apple Silicon**](./python/LocalDevelopmentOnAppleSilicon) - Setup guide for local development of Flink 1.15 on Apple Silicon Macs (not required with Flink 1.18 or later)


### Scala Examples

#### Getting Started
- [**Getting Started - DataStream API**](./scala/GettingStarted) - Skeleton project for a basic Flink Scala application using DataStream API

### Infrastructure & Operations

- [**Auto Scaling**](./infrastructure/AutoScaling) - Custom autoscaler for Amazon Managed Service for Apache Flink
- [**Scheduled Scaling**](./infrastructure/ScheduledScaling) - Scale applications up and down based on daily time schedules
- [**Monitoring**](./infrastructure/monitoring) - Extended CloudWatch Dashboard examples for monitoring applications
- [**Scripts**](./infrastructure/scripts) - Useful shell scripts for interacting with Amazon Managed Service for Apache Flink control plane API

---

## Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

## License Summary

This sample code is made available under the MIT-0 license. See the LICENSE file.

## Security

Expand Down
10 changes: 7 additions & 3 deletions java/FlinkCDC/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Using Flink CDC Sources
# Flink CDC Examples

This folder contains examples showing using Flink CDC Sources as source connectors in Amazon Managed Service for Apache Flink
Examples demonstrating Change Data Capture (CDC) using [Flink CDC source connectors](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.4/docs/connectors/flink-sources/overview/)
in Amazon Managed Service for Apache Flink.

* [Flink CDC SQL Server source (SQL)](./FlinkCDCSQLServerSource), writing to JDBC sink.
## Table of Contents

### Database Sources
- [**Flink CDC SQL Server Source**](./FlinkCDCSQLServerSource) - Capturing changes from SQL Server database and writing to JDBC sink
12 changes: 12 additions & 0 deletions java/Iceberg/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Apache Iceberg Examples

Examples demonstrating how to work with Apache Iceberg tables in Amazon Managed Service for Apache Flink using the DataStream API.

## Table of Contents

### Iceberg Sinks
- [**Iceberg DataStream Sink**](./IcebergDataStreamSink) - Writing data to Iceberg tables using AWS Glue Data Catalog
- [**S3 Table Sink**](./S3TableSink) - Writing data to Iceberg tables stored directly in S3

### Iceberg Sources
- [**Iceberg DataStream Source**](./IcebergDataStreamSource) - Reading data from Iceberg tables using AWS Glue Data Catalog
19 changes: 12 additions & 7 deletions java/KafkaConfigProviders/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
## Configuring Kafka connectors secrets at runtime, using Config Providers
# Kafka Config Providers Examples

This directory includes example that shows how to configure secrets for Kafka connector authentication
scheme at runtime, using [MSK Config Providers](https://github.com/aws-samples/msk-config-providers).
Examples demonstrating secure configuration management for Kafka connectors using MSK Config Providers in Amazon Managed Service for Apache Flink.

Using Config Providers, secrets and files (TrustStore and KeyStore) required to set up the Kafka authentication
and SSL, can be fetched at runtime and not embedded in the application JAR.
These examples show how to configure secrets and certificates for Kafka connector authentication at runtime,
without embedding sensitive information in the application JAR, leveraging [MSK Config Providers](https://github.com/aws-samples/msk-config-providers).

* [Configuring mTLS TrustStore and KeyStore using Config Providers](./Kafka-mTLS-KeystoreWithConfigProviders)
* [Configuring SASL/SCRAM (SASL_SSL) TrustStore and credentials using Config Providers](./Kafka-SASL_SSL-WithConfigProviders)
## Table of Contents

### mTLS Authentication
- [**Kafka mTLS with DataStream API**](./Kafka-mTLS-Keystore-ConfigProviders) - Using Config Providers to fetch KeyStore and passwords for mTLS authentication with DataStream API
- [**Kafka mTLS with Table API & SQL**](./Kafka-mTLS-Keystore-Sql-ConfigProviders) - Using Config Providers to fetch KeyStore and passwords for mTLS authentication with Table API & SQL

### SASL Authentication
- [**Kafka SASL/SCRAM**](./Kafka-SASL_SSL-ConfigProviders) - Using Config Providers to fetch SASL/SCRAM credentials from AWS Secrets Manager
8 changes: 8 additions & 0 deletions java/Serialization/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Serialization Examples

Examples demonstrating data serialization patterns and custom type handling in Amazon Managed Service for Apache Flink.

## Table of Contents

### Custom Serialization
- [**Custom TypeInfo**](./CustomTypeInfo) - Using custom TypeInformation to avoid Kryo serialization fallback