Permalink
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
class-container-curriculum-dev/diagram.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
59 lines (42 sloc)
1.77 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Data Pipeline Flow | |
<div align="center"> | |
```mermaid | |
graph TD | |
subgraph AWS | |
A[AWS Data Source] | |
end | |
subgraph "Container: Extract" | |
B[extract.py] | |
end | |
subgraph "Container: Load" | |
C[load.py] | |
end | |
subgraph "Container: Transform" | |
D[transform.py] | |
end | |
subgraph "PostgreSQL Database" | |
E[Loading Table] | |
F[Final Table] | |
end | |
subgraph "Container: Visualize" | |
G[Flask App] | |
end | |
A -->|Data| B | |
B -->|Extracted Data| C | |
C -->|Load Data| E | |
E -->|Read Data| D | |
D -->|Transformed Data| F | |
F -->|Read Data| G | |
classDef container fill:#e6f3ff,stroke:#333,stroke-width:2px; | |
class B,C,D,G container; | |
classDef scriptText fill:#e6f3ff,stroke:#333,stroke-width:2px,color:black; | |
class B,C,D,G scriptText; | |
``` | |
</div> | |
## Flow Explanation | |
The entire process is orchestrated using shell scripts (extract.sh, load.sh, transform.sh) which manage the execution of each step in the pipeline: | |
1. **Extract**: Data is sourced from AWS and extracted using `extract.py` in the Extract container. | |
2. **Load**: The extracted data is then loaded into the Loading Table of the PostgreSQL database using `load.py` in the Load container. | |
3. **Transform**: Data from the Loading Table is read, transformed using `transform.py` in the Transform container, and then stored in the Final Table. | |
4. **Visualize**: Finally, a Flask app in the Visualize container reads data from the Final Table to create visualizations or serve data via an API. | |
This pipeline ensures a structured flow of data from its source to a visualized or API-accessible format, with clear separation of concerns at each stage. The use of shell scripts for orchestration allows for flexible and controllable execution of the pipeline steps. |