Skip to content

This repo contains details about end to end implementation of the GCP GCS to BQ pipeline using CI/CD leveraging Airflow DEV and PROD Environments, Thanks

Notifications You must be signed in to change notification settings

ViinayKumaarMamidi/GCP_Flight_Booking_Airflow_GCS_to_BQ_to_Looker_End_to_End_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GCP_Flight_Booking_Airflow_GCS_to_BQ_End_to_End_Project

This repo contains details about end to end implementation of the GCP GCS to BQ pipeline using CI/CD leveraging Airflow DEV and PROD Environments, Thanks

Data Flow Details:

Created Data Pipeline to load flight booking CSV file in GCS bucket and using Github Actions yml file- deployed Pyspark/Python files, required variables in Json file, leveraged Serverless Dataproc Cluster, Loaded the data into corresponding DEV/PROD BigQuery tables and implemented Looker Dashboard on PROD BQ Table

Deepwiki documentation URL: https://deepwiki.com/ViinayKumaarMamidi/GCP_Flight_Booking_Airflow_GCS_to_BQ_to_Looker_End_to_End_Project

Ask DeepWiki

Project Details:

  1. Implemented Connections to my Github in VS Code, created a repo and activated the connections
  2. Implemented Pyspark script to read the flight_booking.CSV file from GCS bucket and performed transformations and loaded into Stgaing and final tables in Big Query
  3. Utilized Serverless Dataproc Cluster concepts inside Airflow Script and Deployed the code
  4. Created Github YML file which performs actions to Authenticate the GCS Account and through Actions, uploaded Airflow Job, Spark job and Variables information into GCS bucket
  5. Implemented required variables for DEV and PROD using Json files and uploaded in to GCS bucket folders as needed using YML files
  6. In Github once DEV Airflow DAG ran to success, created Pull request for PROD and PROD airflow DAG ran to success
  7. Created Looker Dashboard on the top of the PROD final Table transformed_flight_data_prod

Source Flight Booking CSV File URL:

https://github.com/ViinayKumaarMamidi/GCP_Flight_Booking_Airflow_GCS_to_BQ_End_to_End_Project/blob/main/flight_booking.csv

Airflow DAG File URL:

https://github.com/ViinayKumaarMamidi/GCP_Flight_Booking_Airflow_GCS_to_BQ_End_to_End_Project/blob/main/airflow_job/airflow_job.py

Pyspark File URL:

https://github.com/ViinayKumaarMamidi/GCP_Flight_Booking_Airflow_GCS_to_BQ_End_to_End_Project/blob/main/spark_job/spark_transformation_job.py

DEV Variables JSON File Details:

https://github.com/ViinayKumaarMamidi/GCP_Flight_Booking_Airflow_GCS_to_BQ_End_to_End_Project/blob/main/variables/dev/variables.json

PROD Variables JSON File Details:

https://github.com/ViinayKumaarMamidi/GCP_Flight_Booking_Airflow_GCS_to_BQ_End_to_End_Project/blob/main/variables/prod/variables.json

Source GCS Bucket Files:

image

DEV Airflow DAG Details:

image

PROD Airflow DAG Details:

image

Composer Airflow Dev and Prod Details:

image

DEV Serverless Dataproc Cluster Log Details:

image

PROD Serverless Dataproc Cluster Log Details:

image

DEV Github Deployment Details:

image

PROD Github Deployment Details:

image

DEV BigQuery Tables Details:

image

PROD BigQuery Tables Details:

image

Looker Dashboard Details:

image

About

This repo contains details about end to end implementation of the GCP GCS to BQ pipeline using CI/CD leveraging Airflow DEV and PROD Environments, Thanks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages