You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<imgsrc="./images/fh-local-labs.png"alt="Factor House Local Labs"width="600"/>
378
+
<imgsrc="https://raw.githubusercontent.com/factorhouse/examples/refs/heads/main/images/fh-local-labs.png"alt="Factor House Local Labs"width="600"/>
379
379
</a>
380
380
381
381
<br>
@@ -388,15 +388,15 @@ Each lab is designed to be modular, hands-on, and production-inspired, helping y
388
388
389
389
<details>
390
390
391
-
<summary><bstyle="font-size: 1.2em;">🎮 Mobile Game Top K Analytics 🎮</b></summary>
391
+
<summary><bstyle="font-size: 1.2em;">Mobile Game Top K Analytics</b></summary>
392
392
393
393
<br>
394
394
395
395
This project walks through how to build a complete real-time analytics pipeline for a mobile game using a modern data stack. It simulates live gameplay data, processes it in real time to calculate performance metrics, and displays the results on an interactive dashboard.
<imgsrc="./images/mobile-game-top-k-analytics.gif"alt="Mobile Game Top K Analytics"width="600"/>
399
+
<imgsrc="https://raw.githubusercontent.com/factorhouse/examples/refs/heads/main/projects/mobile-game-top-k-analytics/images/mobile-game-top-k-analytics.gif"alt="Mobile Game Top K Analytics"width="600"/>
400
400
</a>
401
401
402
402
<br>
@@ -409,15 +409,15 @@ This project walks through how to build a complete real-time analytics pipeline
409
409
410
410
<details>
411
411
412
-
<summary><bstyle="font-size: 1.2em;">🔄 CDC with Debezium on Real-Time theLook eCommerce Data 🗄️</b></summary>
412
+
<summary><bstyle="font-size: 1.2em;">CDC with Debezium on Real-Time theLook eCommerce Data</b></summary>
413
413
414
414
<br>
415
415
416
416
This project unlocks the power of the popular [theLook eCommerce dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce) for modern event-driven applications. It uses a re-engineered [real-time data generator](https://github.com/factorhouse/examples/tree/main/datagen/thelook-ecomm) that transforms the original static dataset into a continuous stream of simulated user activity, writing directly to a PostgreSQL database.
<imgsrc="./images/thelook-datagen.gif"alt="CDC with Debezium on Real-Time theLook eCommerce Data"width="600"/>
420
+
<imgsrc="https://raw.githubusercontent.com/factorhouse/examples/refs/heads/main/datagen/thelook-ecomm/images/thelook-datagen.gif"alt="CDC with Debezium on Real-Time theLook eCommerce Data"width="600"/>
421
421
</a>
422
422
423
423
<br>
@@ -428,6 +428,50 @@ This project unlocks the power of the popular [theLook eCommerce dataset](https:
428
428
429
429
</details>
430
430
431
+
<details>
432
+
433
+
<summary><bstyle="font-size: 1.2em;">A Practical Guide to Data Lineage on Kafka Connect with OpenLineage</b></summary>
434
+
435
+
<br>
436
+
437
+
The lab demonstrates how to capture real-time data lineage from Kafka Connect using a custom Single Message Transform (SMT) - `OpenLineageLifecycleSmt`. It builds a complete pipeline that tracks data from a source connector to S3 and Iceberg sinks, with the full lineage graph visualized in Marquez.
<imgsrc="https://raw.githubusercontent.com/factorhouse/examples/refs/heads/main/projects/data-lineage-labs/images/connector-lineage.gif"alt="A Practical Guide to Data Lineage on Kafka Connect with OpenLineage"width="600"/>
442
+
</a>
443
+
444
+
<br/>
445
+
446
+
[**➡️ Click Here to Explore the Lab**](https://github.com/factorhouse/examples/blob/main/projects/data-lineage-labs/lab1_kafka-connect.md)
447
+
448
+
</div>
449
+
450
+
</details>
451
+
452
+
<details>
453
+
454
+
<summary><bstyle="font-size: 1.2em;">End-to-End Data Lineage from Kafka to Flink and Spark</b></summary>
455
+
456
+
<br>
457
+
458
+
An end-to-end tutorial for establishing data lineage across Kafka, Flink, Spark, and Iceberg. This lab begins by tracking data from a single Kafka topic through parallel pipelines: a Kafka S3 sink connector for raw data archival, a Flink job for real-time analytics, another Flink job for Iceberg ingestion, and a downstream Spark batch job that reads from the Iceberg table.
<imgsrc="https://raw.githubusercontent.com/factorhouse/examples/refs/heads/main/projects/data-lineage-labs/images/end-to-end-lineage.gif"alt="End-to-End Data Lineage from Kafka to Flink and Spark"width="600"/>
463
+
</a>
464
+
465
+
<br/>
466
+
467
+
[**➡️ Click Here to Explore the Lab**](https://github.com/factorhouse/examples/blob/main/projects/data-lineage-labs/lab2_end-to-end.md)
468
+
469
+
</div>
470
+
471
+
</details>
472
+
473
+
<br/>
474
+
431
475
_Stay tuned—more labs and projects are on the way!_
0 commit comments