# 06-cloud-deployment ## Overview In this lesson, we will deploy our weather data pipeline to Google Cloud Platform (GCP) using Google Kubernetes Engine (GKE). We'll cover the process of setting up a GKE cluster, building and pushing Docker images to Google Artifact Registry, and deploying our application components using Kubernetes. By the end of this section, you will have: 1. Set up a GKE cluster 2. Built and pushed Docker images for our data pipeline and Flask app 3. Deployed PostgreSQL, our data pipeline jobs, and the Flask app to Kubernetes 4. Learned how to monitor and debug your deployment This deployment process demonstrates how to take a locally developed data pipeline and deploy it to a cloud environment, showcasing the scalability and flexibility of containerized applications. ## Prerequisites Before starting this lesson, please ensure that you have: 1. Completed the [05-cloud-deployment](../05-cloud-deployment/README.md) lesson 2. Google Cloud SDK installed 3. kubectl installed 4. Docker installed 5. A Google Cloud Platform account with billing enabled ## Lesson Content ### 6.1 Accessing Google Cloud and Setting Up the Environment 1. Log into your provided Google Cloud account. 2. Navigate to the Google Cloud Console and open the Cloud Shell by clicking on the terminal icon in the top-right corner. 3. Once in the Cloud Shell, clone the repository containing the finished code: ```bash git clone https://github.com/timmanik/data-pipeline-workshop ``` 4. Change into the cloned directory: ```bash cd data-pipeline-workshop ``` This repository contains the finished product of the code you've created throughout the workshop. Using this ensures a smooth deployment process. ### 6.2 Setup and GKE Cluster Creation 1. List and set your GCP project: ```bash gcloud projects list gcloud config set project <insert_name_of_project> export PROJECT_ID=$(gcloud config get-value project) echo $PROJECT_ID ``` 2. Create a GKE cluster: ```bash gcloud container clusters create weather-cluster --num-nodes=2 --zone=us-central1-a --quiet > /dev/null 2>&1 & ``` To check the status of the cluster deployment: ```bash gcloud container clusters describe weather-cluster --zone=us-central1-a ``` ### 6.3 Create Container Repository on Artifact Registry ```bash gcloud artifacts repositories create my-docker-repo --project=$PROJECT_ID --location=us --repository-format=docker ``` ### 6.4 Build and Push Docker Images ```bash # Navigate to the data-pipeline directory cd data-pipeline # Build and push data pipeline images docker build --target extract -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest . docker build --target load -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest . docker build --target transform -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest . docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest # Navigate to the flask-app directory cd ../flask-app # Build and push Flask app image docker build -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest . docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest ``` ### 6.5 Kubernetes Deployment 1. Get cluster credentials: ```bash gcloud container clusters get-credentials weather-cluster --zone=us-central1-a ``` 2. Create a Kubernetes secret for database credentials: ```bash kubectl create secret generic db-credentials \ --from-literal=DB_NAME=your_db_name \ --from-literal=DB_USER=your_db_user \ --from-literal=DB_PASSWORD=your_db_password \ --from-literal=DB_HOST=postgres \ --from-literal=DB_PORT=5432 ``` 3. Deploy PostgreSQL: ```bash cd ../gcp-deployment/k8s-artifacts envsubst < postgres-deployment.yaml | kubectl apply -f - envsubst < postgres-service.yaml | kubectl apply -f - ``` 4. Wait for PostgreSQL to be ready: ```bash kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s ``` 5. Deploy data pipeline job: ```bash envsubst < data-pipeline-job.yaml | kubectl apply -f - kubectl create job --from=cronjob/data-pipeline-sequence data-pipeline-sequence kubectl wait --for=condition=complete job/data-pipeline-sequence --timeout=600s ``` 6. Deploy Flask app: ```bash envsubst < flask-app-deployment.yaml | kubectl apply -f - envsubst < flask-app-service.yaml | kubectl apply -f - ``` ### 6.6 Monitoring and Debugging Here are some useful commands for monitoring and debugging your deployment: 1. List all pods: ```bash kubectl get pods ``` 2. View logs for all containers in a pod: ```bash kubectl logs <pod-name> --all-containers=true ``` 3. Describe a pod: ```bash kubectl describe pod <pod-name> ``` 4. Port forward to access services locally: ```bash kubectl port-forward service/flask-app 8080:80 ``` 5. View cluster events: ```bash kubectl get events --sort-by=.metadata.creationTimestamp ``` ## Conclusion In this lesson, you learned how to deploy your weather data pipeline to Google Cloud Platform using Google Kubernetes Engine. You created a GKE cluster, built and pushed Docker images to Google Container Registry, and deployed your application components using Kubernetes. This deployment process demonstrates how to take a locally developed data pipeline and scale it in a cloud environment. The containerized approach ensures consistency across different environments and simplifies the deployment process. ## Key Points - GKE provides a managed Kubernetes environment, simplifying cluster setup and management - Building and pushing Docker images to Google Artifact Registry enables easy deployment to GKE - Kubernetes secrets provide a secure way to manage sensitive information like database credentials - Kubernetes jobs and cronjobs allow for scheduled and one-time execution of tasks - Monitoring and debugging tools in Kubernetes help manage and troubleshoot deployments ## Further Reading - [Google Kubernetes Engine documentation](https://cloud.google.com/kubernetes-engine/docs) - [Kubernetes documentation](https://kubernetes.io/docs/home/) - [Docker documentation](https://docs.docker.com/) - [Google Artifact Registry documentation](https://cloud.google.com/artifact-registry/docs) - [Kubernetes best practices](https://kubernetes.io/docs/concepts/configuration/overview/)