06-cloud-deployment
Overview
In this lesson, we will deploy our weather data pipeline to Google Cloud Platform (GCP) using Google Kubernetes Engine (GKE). We'll cover the process of setting up a GKE cluster, building and pushing Docker images to Google Artifact Registry, and deploying our application components using Kubernetes.
By the end of this section, you will have:
- Set up a GKE cluster
- Built and pushed Docker images for our data pipeline and Flask app
- Deployed PostgreSQL, our data pipeline jobs, and the Flask app to Kubernetes
- Learned how to monitor and debug your deployment
This deployment process demonstrates how to take a locally developed data pipeline and deploy it to a cloud environment, showcasing the scalability and flexibility of containerized applications.
Prerequisites
Before starting this lesson, please ensure that you have:
- Completed the 05-cloud-deployment lesson
- Google Cloud SDK installed
- kubectl installed
- Docker installed
- A Google Cloud Platform account with billing enabled
Lesson Content
6.1 Accessing Google Cloud and Setting Up the Environment
-
Log into your provided Google Cloud account.
-
Navigate to the Google Cloud Console and open the Cloud Shell by clicking on the terminal icon in the top-right corner.
-
Once in the Cloud Shell, clone the repository containing the finished code:
git clone https://github.com/timmanik/data-pipeline-workshop
-
Change into the cloned directory:
cd data-pipeline-workshop
This repository contains the finished product of the code you've created throughout the workshop. Using this ensures a smooth deployment process.
6.2 Setup and GKE Cluster Creation
-
List and set your GCP project:
gcloud projects list gcloud config set project <insert_name_of_project> export PROJECT_ID=$(gcloud config get-value project) echo $PROJECT_ID
-
Create a GKE cluster:
gcloud container clusters create weather-cluster --num-nodes=2 --zone=us-central1-a --quiet > /dev/null 2>&1 &
To check the status of the cluster deployment:
gcloud container clusters describe weather-cluster --zone=us-central1-a
6.3 Create Container Repository on Artifact Registry
gcloud artifacts repositories create my-docker-repo --project=$PROJECT_ID --location=us --repository-format=docker
6.4 Build and Push Docker Images
# Navigate to the data-pipeline directory
cd data-pipeline
# Build and push data pipeline images
docker build --target extract -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest .
docker build --target load -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest .
docker build --target transform -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest .
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest
# Navigate to the flask-app directory
cd ../flask-app
# Build and push Flask app image
docker build -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest .
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest
6.5 Kubernetes Deployment
-
Get cluster credentials:
gcloud container clusters get-credentials weather-cluster --zone=us-central1-a
-
Create a Kubernetes secret for database credentials:
kubectl create secret generic db-credentials \ --from-literal=DB_NAME=your_db_name \ --from-literal=DB_USER=your_db_user \ --from-literal=DB_PASSWORD=your_db_password \ --from-literal=DB_HOST=postgres \ --from-literal=DB_PORT=5432
-
Deploy PostgreSQL:
cd ../gcp-deployment/k8s-artifacts envsubst < postgres-deployment.yaml | kubectl apply -f - envsubst < postgres-service.yaml | kubectl apply -f -
-
Wait for PostgreSQL to be ready:
kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s
-
Deploy data pipeline job:
envsubst < data-pipeline-job.yaml | kubectl apply -f - kubectl create job --from=cronjob/data-pipeline-sequence data-pipeline-sequence kubectl wait --for=condition=complete job/data-pipeline-sequence --timeout=600s
-
Deploy Flask app:
envsubst < flask-app-deployment.yaml | kubectl apply -f - envsubst < flask-app-service.yaml | kubectl apply -f -
6.6 Monitoring and Debugging
Here are some useful commands for monitoring and debugging your deployment:
-
List all pods:
kubectl get pods
-
View logs for all containers in a pod:
kubectl logs <pod-name> --all-containers=true
-
Describe a pod:
kubectl describe pod <pod-name>
-
Port forward to access services locally:
kubectl port-forward service/flask-app 8080:80
-
View cluster events:
kubectl get events --sort-by=.metadata.creationTimestamp
Conclusion
In this lesson, you learned how to deploy your weather data pipeline to Google Cloud Platform using Google Kubernetes Engine. You created a GKE cluster, built and pushed Docker images to Google Container Registry, and deployed your application components using Kubernetes.
This deployment process demonstrates how to take a locally developed data pipeline and scale it in a cloud environment. The containerized approach ensures consistency across different environments and simplifies the deployment process.
Key Points
- GKE provides a managed Kubernetes environment, simplifying cluster setup and management
- Building and pushing Docker images to Google Artifact Registry enables easy deployment to GKE
- Kubernetes secrets provide a secure way to manage sensitive information like database credentials
- Kubernetes jobs and cronjobs allow for scheduled and one-time execution of tasks
- Monitoring and debugging tools in Kubernetes help manage and troubleshoot deployments