diff --git a/README.md b/README.md index 4a86784a..b9dc92d2 100644 --- a/README.md +++ b/README.md @@ -2,10 +2,92 @@ Example applications in Java, Python, Scala and SQL for Amazon Managed Service for Apache Flink (formerly known as Amazon Kinesis Data Analytics), illustrating various aspects of Apache Flink applications, and simple "getting started" base projects. -* [Java examples](./java) -* [Python examples](./python) -* [Scala examples](/scala) -* [Operational utilities and infrastructure code](./infrastructure) +## Table of Contents + +### Java Examples + +#### Getting Started +- [**Getting Started - DataStream API**](./java/GettingStarted) - Skeleton project for a basic Flink Java application using DataStream API +- [**Getting Started - Table API & SQL**](./java/GettingStartedTable) - Basic Flink Java application using Table API & SQL with DataStream API + +#### Connectors +- [**Kinesis Connectors**](./java/KinesisConnectors) - Examples of Flink Kinesis Connector source and sink (standard and EFO) +- [**Kinesis Source Deaggregation**](./java/KinesisSourceDeaggregation) - Handling Kinesis record deaggregation in the Kinesis source +- [**Kafka Connectors**](./java/KafkaConnectors) - Examples of Flink Kafka Connector source and sink +- [**Kafka Config Providers**](./java/KafkaConfigProviders) - Examples of using Kafka Config Providers for secure configuration management +- [**DynamoDB Stream Source**](./java/DynamoDBStreamSource) - Reading from DynamoDB Streams as a source +- [**Kinesis Firehose Sink**](./java/KinesisFirehoseSink) - Writing data to Amazon Kinesis Data Firehose +- [**SQS Sink**](./java/SQSSink) - Writing data to Amazon SQS +- [**Prometheus Sink**](./java/PrometheusSink) - Sending metrics to Prometheus +- [**Flink CDC**](./java/FlinkCDC) - Change Data Capture examples using Flink CDC + +#### Reading and writing files and transactional data lake formats +- [**Iceberg**](./java/Iceberg) - Working with Apache Iceberg and Amazon S3 Tables +- [**S3 Sink**](./java/S3Sink) - Writing JSON data to Amazon S3 +- [**S3 Avro Sink**](./java/S3AvroSink) - Writing Avro format data to Amazon S3 +- [**S3 Avro Source**](./java/S3AvroSource) - Reading Avro format data from Amazon S3 +- [**S3 Parquet Sink**](./java/S3ParquetSink) - Writing Parquet format data to Amazon S3 +- [**S3 Parquet Source**](./java/S3ParquetSource) - Reading Parquet format data from Amazon S3 + +#### Data Formats & Schema Registry +- [**Avro with Glue Schema Registry - Kinesis**](./java/AvroGlueSchemaRegistryKinesis) - Using Avro format with AWS Glue Schema Registry and Kinesis +- [**Avro with Glue Schema Registry - Kafka**](./java/AvroGlueSchemaRegistryKafka) - Using Avro format with AWS Glue Schema Registry and Kafka + +#### Stream Processing Patterns +- [**Serialization**](./java/Serialization) - Serialization of record and state +- [**Windowing**](./java/Windowing) - Time-based window aggregation examples +- [**Side Outputs**](./java/SideOutputs) - Using side outputs for data routing and filtering +- [**Async I/O**](./java/AsyncIO) - Asynchronous I/O patterns with retries for external API calls\ +- [**Custom Metrics**](./java/CustomMetrics) - Creating and publishing custom application metrics + + + +### Python Examples + +#### Getting Started +- [**Getting Started**](./python/GettingStarted) - Basic PyFlink application Table API & SQL + +#### Handling Python dependencies +- [**Python Dependencies**](./python/PythonDependencies) - Managing Python dependencies in PyFlink applications using `requirements.txt` +- [**Packaged Python Dependencies**](./python/PackagedPythonDependencies) - Managing Python dependencies packaged with the PyFlink application at build time + +#### Connectors +- [**Datastream Kafka Connector**](./python/DatastreamKafkaConnector) - Using Kafka connector with PyFlink DataStream API +- [**Kafka Config Providers**](./python/KafkaConfigProviders) - Secure configuration management for Kafka in PyFlink +- [**S3 Sink**](./python/S3Sink) - Writing data to Amazon S3 using PyFlink +- [**Firehose Sink**](./python/FirehoseSink) - Writing data to Amazon Kinesis Data Firehose +- [**Iceberg Sink**](./python/IcebergSink) - Writing data to Apache Iceberg tables + +#### Stream Processing Patterns +- [**Windowing**](./python/Windowing) - Time-based window aggregation examples with PyFlink/SQL +- [**User Defined Functions (UDF)**](./python/UDF) - Creating and using custom functions in PyFlink + +#### Utilities +- [**Data Generator**](./python/data-generator) - Python script for generating sample data to Kinesis Data Streams +- [**Local Development on Apple Silicon**](./python/LocalDevelopmentOnAppleSilicon) - Setup guide for local development of Flink 1.15 on Apple Silicon Macs (not required with Flink 1.18 or later) + + +### Scala Examples + +#### Getting Started +- [**Getting Started - DataStream API**](./scala/GettingStarted) - Skeleton project for a basic Flink Scala application using DataStream API + +### Infrastructure & Operations + +- [**Auto Scaling**](./infrastructure/AutoScaling) - Custom autoscaler for Amazon Managed Service for Apache Flink +- [**Scheduled Scaling**](./infrastructure/ScheduledScaling) - Scale applications up and down based on daily time schedules +- [**Monitoring**](./infrastructure/monitoring) - Extended CloudWatch Dashboard examples for monitoring applications +- [**Scripts**](./infrastructure/scripts) - Useful shell scripts for interacting with Amazon Managed Service for Apache Flink control plane API + +--- + +## Security + +See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. + +## License Summary + +This sample code is made available under the MIT-0 license. See the LICENSE file. ## Security diff --git a/java/FlinkCDC/README.md b/java/FlinkCDC/README.md index c7145692..196b083f 100644 --- a/java/FlinkCDC/README.md +++ b/java/FlinkCDC/README.md @@ -1,5 +1,9 @@ -# Using Flink CDC Sources +# Flink CDC Examples -This folder contains examples showing using Flink CDC Sources as source connectors in Amazon Managed Service for Apache Flink +Examples demonstrating Change Data Capture (CDC) using [Flink CDC source connectors](https://nightlies.apache.org/flink/flink-cdc-docs-release-3.4/docs/connectors/flink-sources/overview/) +in Amazon Managed Service for Apache Flink. -* [Flink CDC SQL Server source (SQL)](./FlinkCDCSQLServerSource), writing to JDBC sink. \ No newline at end of file +## Table of Contents + +### Database Sources +- [**Flink CDC SQL Server Source**](./FlinkCDCSQLServerSource) - Capturing changes from SQL Server database and writing to JDBC sink \ No newline at end of file diff --git a/java/Iceberg/README.md b/java/Iceberg/README.md new file mode 100644 index 00000000..640a3820 --- /dev/null +++ b/java/Iceberg/README.md @@ -0,0 +1,12 @@ +# Apache Iceberg Examples + +Examples demonstrating how to work with Apache Iceberg tables in Amazon Managed Service for Apache Flink using the DataStream API. + +## Table of Contents + +### Iceberg Sinks +- [**Iceberg DataStream Sink**](./IcebergDataStreamSink) - Writing data to Iceberg tables using AWS Glue Data Catalog +- [**S3 Table Sink**](./S3TableSink) - Writing data to Iceberg tables stored directly in S3 + +### Iceberg Sources +- [**Iceberg DataStream Source**](./IcebergDataStreamSource) - Reading data from Iceberg tables using AWS Glue Data Catalog diff --git a/java/KafkaConfigProviders/README.md b/java/KafkaConfigProviders/README.md index 8e32abf0..db4a2bfd 100644 --- a/java/KafkaConfigProviders/README.md +++ b/java/KafkaConfigProviders/README.md @@ -1,10 +1,15 @@ -## Configuring Kafka connectors secrets at runtime, using Config Providers +# Kafka Config Providers Examples -This directory includes example that shows how to configure secrets for Kafka connector authentication -scheme at runtime, using [MSK Config Providers](https://github.com/aws-samples/msk-config-providers). +Examples demonstrating secure configuration management for Kafka connectors using MSK Config Providers in Amazon Managed Service for Apache Flink. -Using Config Providers, secrets and files (TrustStore and KeyStore) required to set up the Kafka authentication -and SSL, can be fetched at runtime and not embedded in the application JAR. +These examples show how to configure secrets and certificates for Kafka connector authentication at runtime, +without embedding sensitive information in the application JAR, leveraging [MSK Config Providers](https://github.com/aws-samples/msk-config-providers). -* [Configuring mTLS TrustStore and KeyStore using Config Providers](./Kafka-mTLS-KeystoreWithConfigProviders) -* [Configuring SASL/SCRAM (SASL_SSL) TrustStore and credentials using Config Providers](./Kafka-SASL_SSL-WithConfigProviders) +## Table of Contents + +### mTLS Authentication +- [**Kafka mTLS with DataStream API**](./Kafka-mTLS-Keystore-ConfigProviders) - Using Config Providers to fetch KeyStore and passwords for mTLS authentication with DataStream API +- [**Kafka mTLS with Table API & SQL**](./Kafka-mTLS-Keystore-Sql-ConfigProviders) - Using Config Providers to fetch KeyStore and passwords for mTLS authentication with Table API & SQL + +### SASL Authentication +- [**Kafka SASL/SCRAM**](./Kafka-SASL_SSL-ConfigProviders) - Using Config Providers to fetch SASL/SCRAM credentials from AWS Secrets Manager diff --git a/java/Serialization/README.md b/java/Serialization/README.md new file mode 100644 index 00000000..dfa39ac4 --- /dev/null +++ b/java/Serialization/README.md @@ -0,0 +1,8 @@ +# Serialization Examples + +Examples demonstrating data serialization patterns and custom type handling in Amazon Managed Service for Apache Flink. + +## Table of Contents + +### Custom Serialization +- [**Custom TypeInfo**](./CustomTypeInfo) - Using custom TypeInformation to avoid Kryo serialization fallback