Kubernetes Deployment for Weather Data Pipeline
Overview
This directory contains Kubernetes (k8s) YAML files for deploying the weather data pipeline and Flask application to a Kubernetes cluster. The deployment consists of the following components:
- PostgreSQL database
- Data pipeline job (Extract, Load, Transform)
- Flask web application
The YAML files define the necessary Kubernetes resources to run these components in a scalable and manageable way.
Code Explanations
postgres-deployment.yaml
This file defines a Deployment for the PostgreSQL database.
Key components:
apiVersion
andkind
: Specifies this is a Deployment resourcemetadata
: Names the deploymentspec
: Defines the desired state of the deploymentreplicas
: Sets the number of pod replicasselector
: Determines which pods are managed by this deploymenttemplate
: Defines the pod templatecontainers
: Specifies the container(s) to run in each pod- Uses the
postgres:13
image - Sets up environment variables from a Secret named
db-credentials
- Exposes port 5432
- Uses the
data-pipeline-job.yaml
This file defines a CronJob for the data pipeline.
Key components:
apiVersion
andkind
: Specifies this is a CronJob resourcemetadata
: Names the jobspec
: Defines the job's schedule and templateschedule
: Sets when the job should run (currently set to never run automatically)jobTemplate
: Defines the job to be runspec.template.spec
: Specifies the pod template for the jobvolumes
: Defines a shared volume for data exchangeinitContainers
: Specifies containers to run before the main containercontainers
: Defines the main container to run- Uses images from a Google Cloud Container Registry
- Sets environment variables from the
db-credentials
Secret
flask-app-deployment.yaml
This file defines a Deployment for the Flask web application.
Key components:
- Similar structure to postgres-deployment.yaml
spec.replicas
: Specifies 2 replicas for high availabilityspec.template.spec.containers
:- Uses an image from a Google Cloud Container Registry
- Sets environment variables from the
db-credentials
Secret - Exposes port 5000
postgres-service.yaml
This file defines a ClusterIP Service for the PostgreSQL database, making it accessible within the cluster.
Key components:
apiVersion
andkind
: Specifies this is a Service resourcespec.type
: Set to ClusterIP for internal cluster accessspec.ports
: Maps the service port to the target port on the pod
flask-app-service.yaml
This file defines a LoadBalancer Service for the Flask application, making it accessible from outside the cluster.
Key components:
- Similar structure to postgres-service.yaml
spec.type
: Set to LoadBalancer for external accessspec.ports
: Maps port 80 to target port 5000 on the pod
Docker Compose vs. Kubernetes
While both Docker Compose and Kubernetes can be used to deploy multi-container applications, they differ in several ways:
-
Scale: Docker Compose is typically used for local development and small-scale deployments, while Kubernetes is designed for large-scale, production environments.
-
Orchestration: Kubernetes provides more advanced orchestration features, such as automatic scaling, rolling updates, and self-healing.
-
Resource Definition: Docker Compose uses a single YAML file, while Kubernetes separates concerns into multiple YAML files for different resource types.
-
Networking: Kubernetes provides more sophisticated networking options, including Services and Ingress controllers.
-
State Management: Kubernetes has built-in primitives for managing stateful applications, such as StatefulSets and PersistentVolumes.
In this deployment, we've translated the Docker Compose setup into Kubernetes resources, allowing for better scalability and management in a cloud environment.
Conclusion
This Kubernetes deployment configuration provides a robust, scalable setup for the weather data pipeline and Flask application. It leverages Kubernetes' features to ensure high availability, ease of management, and efficient resource utilization.
The most important YAML files in this setup are:
postgres-deployment.yaml
: Ensures the database is running and properly configured.data-pipeline-job.yaml
: Manages the ETL process, crucial for data processing.flask-app-deployment.yaml
: Deploys the web application that serves the processed data.
These files form the core of the application, defining how the database, data processing job, and web application are deployed and managed within the Kubernetes cluster.
By moving from Docker Compose to Kubernetes, the application gains the ability to scale more effectively and take advantage of cloud-native features, making it more suitable for production environments. The separation of concerns into different YAML files also improves maintainability and allows for more granular control over each component of the application.