diff --git a/.DS_Store b/.DS_Store index 85eb048..fa0e361 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/.gitignore b/.gitignore index 153c916..8d2ed9e 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ -arch/ \ No newline at end of file +arch/ +.DS_Store \ No newline at end of file diff --git a/06-cloud-deployment/README.md b/06-cloud-deployment/README.md index 8bb196b..6084af3 100644 --- a/06-cloud-deployment/README.md +++ b/06-cloud-deployment/README.md @@ -1 +1,167 @@ -# 06-cloud-deployment \ No newline at end of file +# 04-cloud-deployment + +## Overview + +In this lesson, we will deploy our weather data pipeline to Google Cloud Platform (GCP) using Google Kubernetes Engine (GKE). We'll cover the process of setting up a GKE cluster, building and pushing Docker images to Google Container Registry, and deploying our application components using Kubernetes. + +By the end of this section, you will have: +1. Set up a GKE cluster +2. Built and pushed Docker images for our data pipeline and Flask app +3. Deployed PostgreSQL, our data pipeline jobs, and the Flask app to Kubernetes +4. Learned how to monitor and debug your deployment + +This deployment process demonstrates how to take a locally developed data pipeline and deploy it to a cloud environment, showcasing the scalability and flexibility of containerized applications. + +## Prerequisites + +Before starting this lesson, please ensure that you have: + +1. Completed the [05-cloud-deployment](../05-cloud-deployment/README.md) lesson +2. Google Cloud SDK installed +3. kubectl installed +4. Docker installed +5. A Google Cloud Platform account with billing enabled + +## Lesson Content + +### 4.1 Setup and GKE Cluster Creation + +1. List and set your GCP project: + ```bash + gcloud projects list + gcloud config set project + export PROJECT_ID=$(gcloud config get-value project) + echo $PROJECT_ID + ``` + +2. Create a GKE cluster: + ```bash + gcloud container clusters create weather-cluster --num-nodes=2 --zone=us-central1-a --quiet > /dev/null 2>&1 & + ``` + + To check the status of the cluster deployment: + ```bash + gcloud container clusters describe weather-cluster --zone=us-central1-a + ``` + +### 4.2 Create Container Repository on Artifact Registry + +```bash +gcloud artifacts repositories create my-docker-repo --project=$PROJECT_ID --location=us --repository-format=docker +``` + +### 4.3 Build and Push Docker Images + +```bash +# Navigate to the data-pipeline directory +cd data-pipeline + +# Build and push data pipeline images +docker build --target extract -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest . +docker build --target load -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest . +docker build --target transform -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest . + +docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest +docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest +docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest + +# Navigate to the flask-app directory +cd ../flask-app + +# Build and push Flask app image +docker build -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest . +docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest +``` + +### 4.4 Kubernetes Deployment + +1. Get cluster credentials: + ```bash + gcloud container clusters get-credentials weather-cluster --zone=us-central1-a + ``` + +2. Create a Kubernetes secret for database credentials: + ```bash + kubectl create secret generic db-credentials \ + --from-literal=DB_NAME=your_db_name \ + --from-literal=DB_USER=your_db_user \ + --from-literal=DB_PASSWORD=your_db_password \ + --from-literal=DB_HOST=postgres \ + --from-literal=DB_PORT=5432 + ``` + +3. Deploy PostgreSQL: + ```bash + cd ../gcp-deployment/k8s-artifacts + envsubst < postgres-deployment.yaml | kubectl apply -f - + envsubst < postgres-service.yaml | kubectl apply -f - + ``` + +4. Wait for PostgreSQL to be ready: + ```bash + kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s + ``` + +5. Deploy data pipeline job: + ```bash + envsubst < data-pipeline-job.yaml | kubectl apply -f - + kubectl create job --from=cronjob/data-pipeline-sequence data-pipeline-sequence + kubectl wait --for=condition=complete job/data-pipeline-sequence --timeout=600s + ``` + +6. Deploy Flask app: + ```bash + envsubst < flask-app-deployment.yaml | kubectl apply -f - + envsubst < flask-app-service.yaml | kubectl apply -f - + ``` + +### 4.5 Monitoring and Debugging + +Here are some useful commands for monitoring and debugging your deployment: + +1. List all pods: + ```bash + kubectl get pods + ``` + +2. View logs for all containers in a pod: + ```bash + kubectl logs --all-containers=true + ``` + +3. Describe a pod: + ```bash + kubectl describe pod + ``` + +4. Port forward to access services locally: + ```bash + kubectl port-forward service/flask-app 8080:80 + ``` + +5. View cluster events: + ```bash + kubectl get events --sort-by=.metadata.creationTimestamp + ``` + +## Conclusion + +In this lesson, you learned how to deploy your weather data pipeline to Google Cloud Platform using Google Kubernetes Engine. You created a GKE cluster, built and pushed Docker images to Google Container Registry, and deployed your application components using Kubernetes. + +This deployment process demonstrates how to take a locally developed data pipeline and scale it in a cloud environment. The containerized approach ensures consistency across different environments and simplifies the deployment process. + +## Key Points + +- GKE provides a managed Kubernetes environment, simplifying cluster setup and management +- Building and pushing Docker images to Google Container Registry enables easy deployment to GKE +- Kubernetes secrets provide a secure way to manage sensitive information like database credentials +- Kubernetes jobs and cronjobs allow for scheduled and one-time execution of tasks +- Monitoring and debugging tools in Kubernetes help manage and troubleshoot deployments + +## Further Reading + +- [Google Kubernetes Engine documentation](https://cloud.google.com/kubernetes-engine/docs) +- [Kubernetes documentation](https://kubernetes.io/docs/home/) +- [Docker documentation](https://docs.docker.com/) +- [Google Container Registry documentation](https://cloud.google.com/container-registry/docs) +- [Kubernetes best practices](https://kubernetes.io/docs/concepts/configuration/overview/) diff --git a/07-cloud-cleanup/README.md b/07-cloud-cleanup/README.md new file mode 100644 index 0000000..6ae5b4d --- /dev/null +++ b/07-cloud-cleanup/README.md @@ -0,0 +1,111 @@ +# 05-cloud-cleanup + +## Overview + +In this lesson, we will clean up the resources we created in Google Cloud Platform (GCP) during our weather data pipeline deployment. Proper cleanup is essential to avoid unnecessary costs and maintain a tidy cloud environment. We'll cover the process of deleting Kubernetes resources, the GKE cluster, container images, and other associated resources. + +By the end of this section, you will have: +1. Deleted all Kubernetes resources created during the deployment +2. Removed the GKE cluster +3. Cleaned up container images from Google Container Registry +4. Removed any other associated GCP resources + +This cleanup process demonstrates responsible cloud resource management and helps you avoid unexpected charges on your GCP account. + +## Prerequisites + +Before starting this lesson, please ensure that you have: + +1. Completed the [06-cloud-deployment](../06-cloud-deployment/README.md) lesson +2. Google Cloud SDK installed and configured +3. kubectl installed and configured to work with your GKE cluster +4. Access to the Google Cloud Console + +## Lesson Content + +### 5.1 Delete Kubernetes Resources + +First, we'll remove all the Kubernetes resources we created: + +```bash +kubectl delete cronjob data-pipeline-sequence +kubectl delete job,deployment,service --all +kubectl delete secret db-credentials +``` + +These commands will delete the cronjob, all jobs, deployments, services, and the database credentials secret we created. + +### 5.2 Delete GKE Cluster + +Now, let's delete the GKE cluster: + +```bash +gcloud container clusters delete weather-cluster --zone=us-central1-a --quiet > /dev/null 2>&1 & +``` + +To check the status of the cluster deletion: + +```bash +gcloud container clusters describe weather-cluster --zone=us-central1-a +``` + +If the cluster has been successfully deleted, this command should return an error indicating that the cluster doesn't exist. + +### 5.3 Delete Container Images + +Clean up the container images you pushed to Google Container Registry: + +```bash +# List images +gcloud container images list + +# Delete images (repeat for each image) +gcloud container images list-tags us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/IMAGE_NAME --format='get(digest)' | xargs -I {} gcloud container images delete us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/IMAGE_NAME@{} --force-delete-tags --quiet +``` + +Replace `${PROJECT_ID}` with your actual project ID and `IMAGE_NAME` with each image name (data-pipeline-extract, data-pipeline-load, data-pipeline-transform, flask-app). + +### 5.4 Clean Up Other Resources + +Check for and delete any persistent disks that might have been created: + +```bash +# List disks +gcloud compute disks list + +# Delete disks if any exist +gcloud compute disks delete DISK_NAME --zone=ZONE +``` + +Replace `DISK_NAME` and `ZONE` with the appropriate values if any disks are listed. + +### 5.5 Final Verification + +After running all the cleanup commands, it's a good practice to double-check the Google Cloud Console to ensure all resources have been removed. Pay special attention to: + +1. Kubernetes Engine +2. Artifact Registry +3. Compute Engine (for any lingering disks or instances) +4. VPC Network (for any created firewall rules or IP addresses) + +## Conclusion + +In this lesson, you learned how to properly clean up the resources created during the deployment of your weather data pipeline on Google Cloud Platform. This process included deleting Kubernetes resources, removing the GKE cluster, cleaning up container images, and verifying the deletion of all associated resources. + +Proper cleanup is crucial in cloud environments to avoid unnecessary costs and maintain a well-organized cloud infrastructure. The steps you've learned here can be applied to other projects and deployments, ensuring you always leave your cloud environment in a clean state after completing your work. + +## Key Points + +- Always clean up cloud resources when they're no longer needed to avoid unnecessary costs +- Kubernetes resources should be deleted before deleting the cluster +- GKE cluster deletion may take some time; always verify its status +- Container images in Google Container Registry should be cleaned up to save storage costs +- Double-check the Google Cloud Console to ensure all resources are properly removed + +## Further Reading + +- [Google Kubernetes Engine: Deleting a cluster](https://cloud.google.com/kubernetes-engine/docs/how-to/deleting-a-cluster) +- [Cleaning up Container Registry images](https://cloud.google.com/container-registry/docs/managing#deleting_images) +- [Google Cloud resource clean-up best practices](https://cloud.google.com/blog/products/management-tools/google-cloud-resource-clean-up-best-practices) +- [Kubernetes resource management](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/) +- [GCP billing and cost management](https://cloud.google.com/billing/docs) diff --git a/gcp-deployment/production-workflow-explanation.md b/08-closing-thoughts/production-workflow-explanation.md similarity index 100% rename from gcp-deployment/production-workflow-explanation.md rename to 08-closing-thoughts/production-workflow-explanation.md diff --git a/gcp-deployment/gcp-cleanup-guide.md b/gcp-deployment/gcp-cleanup-guide.md deleted file mode 100644 index fe1f6cd..0000000 --- a/gcp-deployment/gcp-cleanup-guide.md +++ /dev/null @@ -1,43 +0,0 @@ -# GCP and Kubernetes Resource Cleanup Guide - -## 1. Delete Kubernetes Resources - -```bash -kubectl delete cronjob data-pipeline-sequence -kubectl delete job,deployment,service --all -kubectl delete secret db-credentials -``` - -## 2. Delete GKE Cluster - -Deletion command: -```bash -gcloud container clusters delete weather-cluster --zone=us-central1-a --quiet > /dev/null 2>&1 & -``` - -Status check: - -```bash -gcloud container clusters describe weather-cluster --zone=us-central1-a -``` - -## 3. Delete Container Images - -```bash -# List images -gcloud container images list - -# Delete images (repeat for each image) -gcloud container images list-tags gcr.io/YOUR_PROJECT_ID/IMAGE_NAME --format='get(digest)' | xargs -I {} gcloud container images delete gcr.io/YOUR_PROJECT_ID/IMAGE_NAME@{} --force-delete-tags --quiet -``` - -## 4. Clean Up Other Resources - -```bash -# Delete disks -gcloud compute disks delete DISK_NAME --zone=ZONE - -``` - - -Replace placeholders with actual resource names. Double-check the Google Cloud Console to ensure all resources are removed. diff --git a/gcp-deployment/image.png b/gcp-deployment/image.png deleted file mode 100644 index cde3cda..0000000 Binary files a/gcp-deployment/image.png and /dev/null differ diff --git a/gcp-deployment/weather-data-pipeline-deployment-guide.md b/gcp-deployment/weather-data-pipeline-deployment-guide.md deleted file mode 100644 index 17e379c..0000000 --- a/gcp-deployment/weather-data-pipeline-deployment-guide.md +++ /dev/null @@ -1,243 +0,0 @@ -# Weather Data Pipeline Deployment Guide - -## Prerequisites -- Google Cloud SDK installed -- kubectl installed -- Docker installed - -## Setup - -1. List and set your GCP project: - ```bash - gcloud projects list - ``` - Indentify the name of the appropriate project by looking at the PROJECT_ID field. - - ```bash - gcloud config set project - export PROJECT_ID=$(gcloud config get-value project) - ``` - - Make sure that the `$PROJECT_ID` variable is set by running the command below: - - ```bash - echo $PROJECT_ID - ``` - -2. Create a GKE cluster: - ```bash - gcloud container clusters create weather-cluster --num-nodes=2 --zone=us-central1-a --quiet > /dev/null 2>&1 & - ``` - - The above should take about 5-8 minutes. We will run other processes that can be done in parallel and check the status of this from time to time. - - To check the status of the cluster deployment, run the command below: - - ```bash - gcloud container clusters describe weather-cluster --zone=us-central1-a - ``` -## Create container repository on artifact registery - -Create container registry. - ```bash - gcloud artifacts repositories create my-docker-repo --project=class-tmanik-dev --location=us --repository-format=docker - ``` - -## Build and Push Docker Images - -Build and push data pipeline images: - -```bash -# Navigate to the data-pipeline directory -cd data-pipeline - -# Build and push data pipeline images -docker build --target extract -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest . -docker build --target load -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest . -docker build --target transform -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest . - -docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-extract:latest -docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-load:latest -docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/data-pipeline-transform:latest - -# Navigate to the flask-app directory -cd ../flask-app - -# Build and push Flask app image -docker build -t us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest . -docker push us-docker.pkg.dev/${PROJECT_ID}/my-docker-repo/flask-app:latest - -``` - - -## Get credentials for the cluster. - -Before we get credentials for the cluster, we'll need to check if the cluster has finished being created by running the command: - - ```bash - gcloud container clusters describe weather-cluster --zone=us-central1-a - ``` - - -Once the cluster has finished being created, you can run the command below to create and get the cluster credentials. - -You can know if the cluster is finished running if the output of the command above is -```text -status: RUNNING -``` - -Anything other than the status of "RUNNING" means that the cluster has not finished being created. - -If the cluster is still being created, wait for a few minutes and do not proceed to the next steps till the creation process is finished. - -Create a Kubernetes secret for database credentials: - ```bash - kubectl create secret generic db-credentials \ - --from-literal=DB_NAME=your_db_name \ - --from-literal=DB_USER=your_db_user \ - --from-literal=DB_PASSWORD=your_db_password \ - --from-literal=DB_HOST=postgres \ - --from-literal=DB_PORT=5432 - ``` - Then - ```bash - gcloud container clusters get-credentials weather-cluster --zone=us-central1-a - ``` - -## Deploy to Kubernetes - -Change directory to the folder with the Kubernetes artifacts: - -```bash -cd ../gcp-deployment/k8s-artifacts -``` - -Deploy PostgreSQL: - -```bash -envsubst < postgres-deployment.yaml | kubectl apply -f - -envsubst < postgres-service.yaml | kubectl apply -f - -``` - -# Wait for PostgreSQL to be ready - -Run the command below to check if Postgres is ready. We cannot move to the next step unless it is. -``` -kubectl wait --for=condition=ready pod -l app=postgres --timeout=300s -``` - -If Postgres is ready, the output of the command above should say: - -```text -pod/postgres- condition met -``` -Once Postgres has finished being deployed, proceed to the next step. - -Deploy data pipeline job: - -```bash -envsubst < data-pipeline-job.yaml | kubectl apply -f - -kubectl create job --from=cronjob/data-pipeline-sequence data-pipeline-sequence - -# Wait for data pipeline job to complete -kubectl wait --for=condition=complete job/data-pipeline-sequence --timeout=600s -``` -Once completed, proceed to the next step. - -Deploy Flask app: - -```bash -envsubst < flask-app-deployment.yaml | kubectl apply -f - -envsubst < flask-app-service.yaml | kubectl apply -f - -``` - - -## Monitoring and waiting - -List all pods: - ```bash - kubectl get pods - ``` - -![alt text](image.png) - -View logs for all containers in a pod: - ```bash - kubectl logs --all-containers=true - ``` -View logs for the data-pipeline pod. You should expect to see similar logs as you saw when you ran the docker-compose deployment on your local machine. The final log should say: -```text -Data transformation and loading to final table completed. -``` - -View logs for the flask-app pod. - - -## Useful Commands for Monitoring and Debugging - -1. List all pods: - ```bash - kubectl get pods - ``` - -2. List all jobs: - ```bash - kubectl get jobs - ``` - -3. Delete a specific job: - ```bash - kubectl delete cronjob data-pipeline-sequence - ``` - -4. Delete a cronjob: - ```bash - kubectl delete cronjob data-pipeline-sequence - ``` - -5. View logs for a specific container in a pod: - ```bash - kubectl logs -c - ``` - -6. List init container names for a pod: - ```bash - kubectl get pod -o jsonpath='{.spec.initContainers[*].name}' - ``` - -7. View logs continuously: - ```bash - kubectl logs -f -c - ``` - -8. View logs for all containers in a pod: - ```bash - kubectl logs --all-containers=true - ``` - -9. Describe a pod (lists all containers and their statuses): - ```bash - kubectl describe pod - ``` - -10. Test environment variable substitution: - ```bash - envsubst < data-pipeline-job.yaml - ``` - -11. Port forward to access services locally: - ```bash - kubectl port-forward service/flask-app 8080:80 - ``` - -12. Scale a deployment: - ```bash - kubectl scale deployment flask-app --replicas=3 - ``` - -13. View cluster events: - ```bash - kubectl get events --sort-by=.metadata.creationTimestamp - ``` - -Remember to replace placeholders like `` and `` with actual values from your deployment.