Skip to content
Permalink
1b22c77787
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
@tmanik
Latest commit 1b22c77 Sep 17, 2024 History
1 contributor

Users who have contributed to this file

59 lines (42 sloc) 1.77 KB
# Data Pipeline Flow
<div align="center">
```mermaid
graph TD
subgraph AWS
A[AWS Data Source]
end
subgraph "Container: Extract"
B[extract.py]
end
subgraph "Container: Load"
C[load.py]
end
subgraph "Container: Transform"
D[transform.py]
end
subgraph "PostgreSQL Database"
E[Loading Table]
F[Final Table]
end
subgraph "Container: Visualize"
G[Flask App]
end
A -->|Data| B
B -->|Extracted Data| C
C -->|Load Data| E
E -->|Read Data| D
D -->|Transformed Data| F
F -->|Read Data| G
classDef container fill:#e6f3ff,stroke:#333,stroke-width:2px;
class B,C,D,G container;
classDef scriptText fill:#e6f3ff,stroke:#333,stroke-width:2px,color:black;
class B,C,D,G scriptText;
```
</div>
## Flow Explanation
The entire process is orchestrated using shell scripts (extract.sh, load.sh, transform.sh) which manage the execution of each step in the pipeline:
1. **Extract**: Data is sourced from AWS and extracted using `extract.py` in the Extract container.
2. **Load**: The extracted data is then loaded into the Loading Table of the PostgreSQL database using `load.py` in the Load container.
3. **Transform**: Data from the Loading Table is read, transformed using `transform.py` in the Transform container, and then stored in the Final Table.
4. **Visualize**: Finally, a Flask app in the Visualize container reads data from the Final Table to create visualizations or serve data via an API.
This pipeline ensures a structured flow of data from its source to a visualized or API-accessible format, with clear separation of concerns at each stage. The use of shell scripts for orchestration allows for flexible and controllable execution of the pipeline steps.