Skip to content
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
class-container-curriculum-dev/06-cloud-deployment/
class-container-curriculum-dev/06-cloud-deployment/

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
September 1, 2024 16:42

06-cloud-deployment

Overview

In this lesson, we will deploy our weather data pipeline to Google Cloud Platform (GCP) using Google Kubernetes Engine (GKE). We'll cover the process of setting up a GKE cluster, building and pushing Docker images to Google Artifact Registry, and deploying our application components using Kubernetes.

By the end of this section, you will have:

  1. Set up a GKE cluster
  2. Built and pushed Docker images for our data pipeline and Flask app
  3. Deployed PostgreSQL, our data pipeline jobs, and the Flask app to Kubernetes
  4. Learned how to monitor and debug your deployment

This deployment process demonstrates how to take a locally developed data pipeline and deploy it to a cloud environment, showcasing the scalability and flexibility of containerized applications.

Prerequisites

Before starting this lesson, please ensure that you have:

  1. Completed the 05-cloud-deployment lesson
  2. Google Cloud SDK installed
  3. kubectl installed
  4. Docker installed
  5. A Google Cloud Platform account with billing enabled

Lesson Content

6.1 Accessing Google Cloud and Setting Up the Environment

  1. Log into your provided Google Cloud account.

  2. Navigate to the Google Cloud Console and open the Cloud Shell by clicking on the terminal icon in the top-right corner.

  3. Once in the Cloud Shell, clone the repository containing the finished code:

    git clone https://github.com/timmanik/data-pipeline-workshop
  4. Change into the cloned directory:

    cd data-pipeline-workshop

This repository contains the finished product of the code you've created throughout the workshop. Using this ensures a smooth deployment process.

6.2 Setup and GKE Cluster Creation

  1. List and set your GCP project:

    gcloud projects list
    gcloud config set project <insert_name_of_project>
    export PROJECT_ID=$(gcloud config get-value project)
    echo $PROJECT_ID
  2. Create a GKE cluster:

    gcloud container clusters create weather-cluster --num-nodes=2 --zone=us-central1-a --quiet > /dev/null 2>&1 &

    To check the status of the cluster deployment:

    gcloud container clusters describe weather-cluster --zone=us-central1-a

6.3 Create Container Repository on Artifact Registry

gcloud artifacts repositories create my-docker-repo --project=$PROJECT_ID --location=us --repository-format=docker

6.4 Build and Push Docker Images

# Navigate to the data-pipeline directory
cd data-pipeline

# Build and push data pipeline images
docker build --target extract -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest .
docker build --target load -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest .
docker build --target transform -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest .

docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest

# Navigate to the flask-app directory
cd ../flask-app

# Build and push Flask app image
docker build -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest .
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest

6.5 Kubernetes Deployment

  1. Get cluster credentials:

    gcloud container clusters get-credentials weather-cluster --zone=us-central1-a
  2. Create a Kubernetes secret for database credentials:

    kubectl create secret generic db-credentials \
      --from-literal=DB_NAME=your_db_name \
      --from-literal=DB_USER=your_db_user \
      --from-literal=DB_PASSWORD=your_db_password \
      --from-literal=DB_HOST=postgres \
      --from-literal=DB_PORT=5432
  3. Deploy PostgreSQL:

    cd ../gcp-deployment/k8s-artifacts
    envsubst < postgres-deployment.yaml | kubectl apply -f -
    envsubst < postgres-service.yaml | kubectl apply -f -
  4. Wait for PostgreSQL to be ready:

    kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s
  5. Deploy data pipeline job:

    envsubst < data-pipeline-job.yaml | kubectl apply -f -
    kubectl create job --from=cronjob/data-pipeline-sequence data-pipeline-sequence
    kubectl wait --for=condition=complete job/data-pipeline-sequence --timeout=600s
  6. Deploy Flask app:

    envsubst < flask-app-deployment.yaml | kubectl apply -f -
    envsubst < flask-app-service.yaml | kubectl apply -f -

6.6 Monitoring and Debugging

Here are some useful commands for monitoring and debugging your deployment:

  1. List all pods:

    kubectl get pods
  2. View logs for all containers in a pod:

    kubectl logs <pod-name> --all-containers=true
  3. Describe a pod:

    kubectl describe pod <pod-name>
  4. Port forward to access services locally:

    kubectl port-forward service/flask-app 8080:80
  5. View cluster events:

    kubectl get events --sort-by=.metadata.creationTimestamp

Conclusion

In this lesson, you learned how to deploy your weather data pipeline to Google Cloud Platform using Google Kubernetes Engine. You created a GKE cluster, built and pushed Docker images to Google Container Registry, and deployed your application components using Kubernetes.

This deployment process demonstrates how to take a locally developed data pipeline and scale it in a cloud environment. The containerized approach ensures consistency across different environments and simplifies the deployment process.

Key Points

  • GKE provides a managed Kubernetes environment, simplifying cluster setup and management
  • Building and pushing Docker images to Google Artifact Registry enables easy deployment to GKE
  • Kubernetes secrets provide a secure way to manage sensitive information like database credentials
  • Kubernetes jobs and cronjobs allow for scheduled and one-time execution of tasks
  • Monitoring and debugging tools in Kubernetes help manage and troubleshoot deployments

Further Reading