Skip to content
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
September 1, 2024 15:55

Kubernetes Deployment for Weather Data Pipeline

Overview

This directory contains Kubernetes (k8s) YAML files for deploying the weather data pipeline and Flask application to a Kubernetes cluster. The deployment consists of the following components:

  1. PostgreSQL database
  2. Data pipeline job (Extract, Load, Transform)
  3. Flask web application

The YAML files define the necessary Kubernetes resources to run these components in a scalable and manageable way.

Code Explanations

postgres-deployment.yaml

This file defines a Deployment for the PostgreSQL database.

Key components:

  • apiVersion and kind: Specifies this is a Deployment resource
  • metadata: Names the deployment
  • spec: Defines the desired state of the deployment
    • replicas: Sets the number of pod replicas
    • selector: Determines which pods are managed by this deployment
    • template: Defines the pod template
      • containers: Specifies the container(s) to run in each pod
        • Uses the postgres:13 image
        • Sets up environment variables from a Secret named db-credentials
        • Exposes port 5432

data-pipeline-job.yaml

This file defines a CronJob for the data pipeline.

Key components:

  • apiVersion and kind: Specifies this is a CronJob resource
  • metadata: Names the job
  • spec: Defines the job's schedule and template
    • schedule: Sets when the job should run (currently set to never run automatically)
    • jobTemplate: Defines the job to be run
      • spec.template.spec: Specifies the pod template for the job
        • volumes: Defines a shared volume for data exchange
        • initContainers: Specifies containers to run before the main container
        • containers: Defines the main container to run
          • Uses images from a Google Cloud Container Registry
          • Sets environment variables from the db-credentials Secret

flask-app-deployment.yaml

This file defines a Deployment for the Flask web application.

Key components:

  • Similar structure to postgres-deployment.yaml
  • spec.replicas: Specifies 2 replicas for high availability
  • spec.template.spec.containers:
    • Uses an image from a Google Cloud Container Registry
    • Sets environment variables from the db-credentials Secret
    • Exposes port 5000

postgres-service.yaml

This file defines a ClusterIP Service for the PostgreSQL database, making it accessible within the cluster.

Key components:

  • apiVersion and kind: Specifies this is a Service resource
  • spec.type: Set to ClusterIP for internal cluster access
  • spec.ports: Maps the service port to the target port on the pod

flask-app-service.yaml

This file defines a LoadBalancer Service for the Flask application, making it accessible from outside the cluster.

Key components:

  • Similar structure to postgres-service.yaml
  • spec.type: Set to LoadBalancer for external access
  • spec.ports: Maps port 80 to target port 5000 on the pod

Docker Compose vs. Kubernetes

While both Docker Compose and Kubernetes can be used to deploy multi-container applications, they differ in several ways:

  1. Scale: Docker Compose is typically used for local development and small-scale deployments, while Kubernetes is designed for large-scale, production environments.

  2. Orchestration: Kubernetes provides more advanced orchestration features, such as automatic scaling, rolling updates, and self-healing.

  3. Resource Definition: Docker Compose uses a single YAML file, while Kubernetes separates concerns into multiple YAML files for different resource types.

  4. Networking: Kubernetes provides more sophisticated networking options, including Services and Ingress controllers.

  5. State Management: Kubernetes has built-in primitives for managing stateful applications, such as StatefulSets and PersistentVolumes.

In this deployment, we've translated the Docker Compose setup into Kubernetes resources, allowing for better scalability and management in a cloud environment.

Conclusion

This Kubernetes deployment configuration provides a robust, scalable setup for the weather data pipeline and Flask application. It leverages Kubernetes' features to ensure high availability, ease of management, and efficient resource utilization.

The most important YAML files in this setup are:

  1. postgres-deployment.yaml: Ensures the database is running and properly configured.
  2. data-pipeline-job.yaml: Manages the ETL process, crucial for data processing.
  3. flask-app-deployment.yaml: Deploys the web application that serves the processed data.

These files form the core of the application, defining how the database, data processing job, and web application are deployed and managed within the Kubernetes cluster.

By moving from Docker Compose to Kubernetes, the application gains the ability to scale more effectively and take advantage of cloud-native features, making it more suitable for production environments. The separation of concerns into different YAML files also improves maintainability and allows for more granular control over each component of the application.