You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Real-time data pipeline for STEDI Human Balance Analytics using Redis, Kafka, Kafka Connect, and Spark Structured Streaming. Parses customer events, risk events, joins streams, and publishes enriched fall-risk insights to power the live STEDI risk analysis graph.
This project was developed as part of UE20CS343 - Database Technologies to build a real-time data streaming pipeline using Apache Kafka and Spark Structured Streaming. It simulates ingesting San Francisco crime data into Kafka, processing it with Spark, and performing aggregations and stream-table joins.
A sandbox environment designed to simulate a pseudo-distributed Hadoop cluster with integrated Apache Spark and Kafka components. It allows developers to prototype and experiment with big data workflows, test distributed computing patterns, and explore cluster behavior in a contained virtual setup.