Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Added readmes for 06 and 07
tmanik committed Aug 31, 2024
1 parent 0463dda commit cd8f512
Showing 8 changed files with 280 additions and 288 deletions.
Binary file modified .DS_Store
Binary file not shown.
3 changes: 2 additions & 1 deletion .gitignore
@@ -1 +1,2 @@
arch/
arch/
.DS_Store
168 changes: 167 additions & 1 deletion 06-cloud-deployment/README.md
@@ -1 +1,167 @@
# 06-cloud-deployment
# 04-cloud-deployment

## Overview

In this lesson, we will deploy our weather data pipeline to Google Cloud Platform (GCP) using Google Kubernetes Engine (GKE). We'll cover the process of setting up a GKE cluster, building and pushing Docker images to Google Container Registry, and deploying our application components using Kubernetes.

By the end of this section, you will have:
1. Set up a GKE cluster
2. Built and pushed Docker images for our data pipeline and Flask app
3. Deployed PostgreSQL, our data pipeline jobs, and the Flask app to Kubernetes
4. Learned how to monitor and debug your deployment

This deployment process demonstrates how to take a locally developed data pipeline and deploy it to a cloud environment, showcasing the scalability and flexibility of containerized applications.

## Prerequisites

Before starting this lesson, please ensure that you have:

1. Completed the [05-cloud-deployment](../05-cloud-deployment/README.md) lesson
2. Google Cloud SDK installed
3. kubectl installed
4. Docker installed
5. A Google Cloud Platform account with billing enabled

## Lesson Content

### 4.1 Setup and GKE Cluster Creation

1. List and set your GCP project:
```bash
gcloud projects list
gcloud config set project <insert_name_of_project>
export PROJECT_ID=$(gcloud config get-value project)
echo $PROJECT_ID
```

2. Create a GKE cluster:
```bash
gcloud container clusters create weather-cluster --num-nodes=2 --zone=us-central1-a --quiet > /dev/null 2>&1 &
```

To check the status of the cluster deployment:
```bash
gcloud container clusters describe weather-cluster --zone=us-central1-a
```

### 4.2 Create Container Repository on Artifact Registry

```bash
gcloud artifacts repositories create my-docker-repo --project=$PROJECT_ID --location=us --repository-format=docker
```

### 4.3 Build and Push Docker Images

```bash
# Navigate to the data-pipeline directory
cd data-pipeline

# Build and push data pipeline images
docker build --target extract -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest .
docker build --target load -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest .
docker build --target transform -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest .

docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest

# Navigate to the flask-app directory
cd ../flask-app

# Build and push Flask app image
docker build -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest .
docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest
```

### 4.4 Kubernetes Deployment

1. Get cluster credentials:
```bash
gcloud container clusters get-credentials weather-cluster --zone=us-central1-a
```

2. Create a Kubernetes secret for database credentials:
```bash
kubectl create secret generic db-credentials \
--from-literal=DB_NAME=your_db_name \
--from-literal=DB_USER=your_db_user \
--from-literal=DB_PASSWORD=your_db_password \
--from-literal=DB_HOST=postgres \
--from-literal=DB_PORT=5432
```

3. Deploy PostgreSQL:
```bash
cd ../gcp-deployment/k8s-artifacts
envsubst < postgres-deployment.yaml | kubectl apply -f -
envsubst < postgres-service.yaml | kubectl apply -f -
```

4. Wait for PostgreSQL to be ready:
```bash
kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s
```

5. Deploy data pipeline job:
```bash
envsubst < data-pipeline-job.yaml | kubectl apply -f -
kubectl create job --from=cronjob/data-pipeline-sequence data-pipeline-sequence
kubectl wait --for=condition=complete job/data-pipeline-sequence --timeout=600s
```

6. Deploy Flask app:
```bash
envsubst < flask-app-deployment.yaml | kubectl apply -f -
envsubst < flask-app-service.yaml | kubectl apply -f -
```

### 4.5 Monitoring and Debugging

Here are some useful commands for monitoring and debugging your deployment:

1. List all pods:
```bash
kubectl get pods
```

2. View logs for all containers in a pod:
```bash
kubectl logs <pod-name> --all-containers=true
```

3. Describe a pod:
```bash
kubectl describe pod <pod-name>
```

4. Port forward to access services locally:
```bash
kubectl port-forward service/flask-app 8080:80
```

5. View cluster events:
```bash
kubectl get events --sort-by=.metadata.creationTimestamp
```

## Conclusion

In this lesson, you learned how to deploy your weather data pipeline to Google Cloud Platform using Google Kubernetes Engine. You created a GKE cluster, built and pushed Docker images to Google Container Registry, and deployed your application components using Kubernetes.

This deployment process demonstrates how to take a locally developed data pipeline and scale it in a cloud environment. The containerized approach ensures consistency across different environments and simplifies the deployment process.

## Key Points

- GKE provides a managed Kubernetes environment, simplifying cluster setup and management
- Building and pushing Docker images to Google Container Registry enables easy deployment to GKE
- Kubernetes secrets provide a secure way to manage sensitive information like database credentials
- Kubernetes jobs and cronjobs allow for scheduled and one-time execution of tasks
- Monitoring and debugging tools in Kubernetes help manage and troubleshoot deployments

## Further Reading

- [Google Kubernetes Engine documentation](https://cloud.google.com/kubernetes-engine/docs)
- [Kubernetes documentation](https://kubernetes.io/docs/home/)
- [Docker documentation](https://docs.docker.com/)
- [Google Container Registry documentation](https://cloud.google.com/container-registry/docs)
- [Kubernetes best practices](https://kubernetes.io/docs/concepts/configuration/overview/)
111 changes: 111 additions & 0 deletions 07-cloud-cleanup/README.md
@@ -0,0 +1,111 @@
# 05-cloud-cleanup

## Overview

In this lesson, we will clean up the resources we created in Google Cloud Platform (GCP) during our weather data pipeline deployment. Proper cleanup is essential to avoid unnecessary costs and maintain a tidy cloud environment. We'll cover the process of deleting Kubernetes resources, the GKE cluster, container images, and other associated resources.

By the end of this section, you will have:
1. Deleted all Kubernetes resources created during the deployment
2. Removed the GKE cluster
3. Cleaned up container images from Google Container Registry
4. Removed any other associated GCP resources

This cleanup process demonstrates responsible cloud resource management and helps you avoid unexpected charges on your GCP account.

## Prerequisites

Before starting this lesson, please ensure that you have:

1. Completed the [06-cloud-deployment](../06-cloud-deployment/README.md) lesson
2. Google Cloud SDK installed and configured
3. kubectl installed and configured to work with your GKE cluster
4. Access to the Google Cloud Console

## Lesson Content

### 5.1 Delete Kubernetes Resources

First, we'll remove all the Kubernetes resources we created:

```bash
kubectl delete cronjob data-pipeline-sequence
kubectl delete job,deployment,service --all
kubectl delete secret db-credentials
```

These commands will delete the cronjob, all jobs, deployments, services, and the database credentials secret we created.

### 5.2 Delete GKE Cluster

Now, let's delete the GKE cluster:

```bash
gcloud container clusters delete weather-cluster --zone=us-central1-a --quiet > /dev/null 2>&1 &
```

To check the status of the cluster deletion:

```bash
gcloud container clusters describe weather-cluster --zone=us-central1-a
```

If the cluster has been successfully deleted, this command should return an error indicating that the cluster doesn't exist.

### 5.3 Delete Container Images

Clean up the container images you pushed to Google Container Registry:

```bash
# List images
gcloud container images list

# Delete images (repeat for each image)
gcloud container images list-tags us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/IMAGE_NAME --format='get(digest)' | xargs -I {} gcloud container images delete us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/IMAGE_NAME@{} --force-delete-tags --quiet
```

Replace `${PROJECT_ID}` with your actual project ID and `IMAGE_NAME` with each image name (data-pipeline-extract, data-pipeline-load, data-pipeline-transform, flask-app).

### 5.4 Clean Up Other Resources

Check for and delete any persistent disks that might have been created:

```bash
# List disks
gcloud compute disks list

# Delete disks if any exist
gcloud compute disks delete DISK_NAME --zone=ZONE
```

Replace `DISK_NAME` and `ZONE` with the appropriate values if any disks are listed.

### 5.5 Final Verification

After running all the cleanup commands, it's a good practice to double-check the Google Cloud Console to ensure all resources have been removed. Pay special attention to:

1. Kubernetes Engine
2. Artifact Registry
3. Compute Engine (for any lingering disks or instances)
4. VPC Network (for any created firewall rules or IP addresses)

## Conclusion

In this lesson, you learned how to properly clean up the resources created during the deployment of your weather data pipeline on Google Cloud Platform. This process included deleting Kubernetes resources, removing the GKE cluster, cleaning up container images, and verifying the deletion of all associated resources.

Proper cleanup is crucial in cloud environments to avoid unnecessary costs and maintain a well-organized cloud infrastructure. The steps you've learned here can be applied to other projects and deployments, ensuring you always leave your cloud environment in a clean state after completing your work.

## Key Points

- Always clean up cloud resources when they're no longer needed to avoid unnecessary costs
- Kubernetes resources should be deleted before deleting the cluster
- GKE cluster deletion may take some time; always verify its status
- Container images in Google Container Registry should be cleaned up to save storage costs
- Double-check the Google Cloud Console to ensure all resources are properly removed

## Further Reading

- [Google Kubernetes Engine: Deleting a cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/deleting-a-cluster)
- [Cleaning up Container Registry images](https://cloud.google.com/container-registry/docs/managing#deleting_images)
- [Google Cloud resource clean-up best practices](https://cloud.google.com/blog/products/management-tools/google-cloud-resource-clean-up-best-practices)
- [Kubernetes resource management](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/)
- [GCP billing and cost management](https://cloud.google.com/billing/docs)
43 changes: 0 additions & 43 deletions gcp-deployment/gcp-cleanup-guide.md

This file was deleted.

Binary file removed gcp-deployment/image.png
Binary file not shown.
243 changes: 0 additions & 243 deletions gcp-deployment/weather-data-pipeline-deployment-guide.md

This file was deleted.

0 comments on commit cd8f512

Please sign in to comment.