Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
1 changed file
with
59 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Data Pipeline Flow | ||
|
||
<div align="center"> | ||
|
||
```mermaid | ||
graph TD | ||
subgraph AWS | ||
A[AWS Data Source] | ||
end | ||
subgraph "Container: Extract" | ||
B[extract.py] | ||
end | ||
subgraph "Container: Load" | ||
C[load.py] | ||
end | ||
subgraph "Container: Transform" | ||
D[transform.py] | ||
end | ||
subgraph "PostgreSQL Database" | ||
E[Loading Table] | ||
F[Final Table] | ||
end | ||
subgraph "Container: Visualize" | ||
G[Flask App] | ||
end | ||
A -->|Data| B | ||
B -->|Extracted Data| C | ||
C -->|Load Data| E | ||
E -->|Read Data| D | ||
D -->|Transformed Data| F | ||
F -->|Read Data| G | ||
classDef container fill:#e6f3ff,stroke:#333,stroke-width:2px; | ||
class B,C,D,G container; | ||
classDef scriptText fill:#e6f3ff,stroke:#333,stroke-width:2px,color:black; | ||
class B,C,D,G scriptText; | ||
``` | ||
|
||
</div> | ||
|
||
## Flow Explanation | ||
|
||
The entire process is orchestrated using shell scripts (extract.sh, load.sh, transform.sh) which manage the execution of each step in the pipeline: | ||
|
||
1. **Extract**: Data is sourced from AWS and extracted using `extract.py` in the Extract container. | ||
|
||
2. **Load**: The extracted data is then loaded into the Loading Table of the PostgreSQL database using `load.py` in the Load container. | ||
|
||
3. **Transform**: Data from the Loading Table is read, transformed using `transform.py` in the Transform container, and then stored in the Final Table. | ||
|
||
4. **Visualize**: Finally, a Flask app in the Visualize container reads data from the Final Table to create visualizations or serve data via an API. | ||
|
||
This pipeline ensures a structured flow of data from its source to a visualized or API-accessible format, with clear separation of concerns at each stage. The use of shell scripts for orchestration allows for flexible and controllable execution of the pipeline steps. |