Fully automated, PII-protected data pipeline that extracts sensitive employee data, applies compliance-grade security (salary masking + SHA-256 password hashing), loads into BigQuery, and delivers insights via Tableau.
- Python + Faker (data generation)
- Google Cloud Storage (raw landing zone)
- Cloud Data Fusion (no-code transformation + PII protection)
- BigQuery (analytical warehouse)
- Cloud Composer (Apache Airflow orchestration)
- Tableau (interactive dashboard)
- Salary masked (
xxxxx) - Passwords SHA-256 hashed (cryptographic security)
- Daily automated execution via Airflow
- Full end-to-end orchestration with dependency management
- 100% GCP-native, production-grade design
- 100% automation — eliminated manual data handling
- 100% PII compliance — zero exposure of salary or passwords
- Scalable — tested with 100 records; ready for 100K+ with zero code changes
- Cost — ~$0.50 per daily run on Data Fusion
- dags/employee_secure_daily_pipeline.py → Airflow orchestration
- dags/scripts/extract.py → Data generation + GCS upload