Skip to content

Fully automated GCP pipeline: Python → GCS → Cloud Data Fusion (salary masking + SHA-256 password hashing) → BigQuery → Tableau | Daily Airflow orchestration

License

Notifications You must be signed in to change notification settings

nitesht2/gcp-secure-employee-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Cloud(GCP) Secured Employee Data Pipeline

Fully automated, PII-protected data pipeline that extracts sensitive employee data, applies compliance-grade security (salary masking + SHA-256 password hashing), loads into BigQuery, and delivers insights via Tableau.

Screenshot 2025-11-24 at 5 39 45 PM

Tech Stack

  • Python + Faker (data generation)
  • Google Cloud Storage (raw landing zone)
  • Cloud Data Fusion (no-code transformation + PII protection)
  • BigQuery (analytical warehouse)
  • Cloud Composer (Apache Airflow orchestration)
  • Tableau (interactive dashboard)

Key Features

  • Salary masked (xxxxx)
  • Passwords SHA-256 hashed (cryptographic security)
  • Daily automated execution via Airflow
  • Full end-to-end orchestration with dependency management
  • 100% GCP-native, production-grade design

Business Impact

  • 100% automation — eliminated manual data handling
  • 100% PII compliance — zero exposure of salary or passwords
  • Scalable — tested with 100 records; ready for 100K+ with zero code changes
  • Cost — ~$0.50 per daily run on Data Fusion

Project Structure

  • dags/employee_secure_daily_pipeline.py → Airflow orchestration
  • dags/scripts/extract.py → Data generation + GCS upload

About

Fully automated GCP pipeline: Python → GCS → Cloud Data Fusion (salary masking + SHA-256 password hashing) → BigQuery → Tableau | Daily Airflow orchestration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages