Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Added diagram
tmanik committed Sep 17, 2024
1 parent 219a69a commit 1b22c77
Showing 1 changed file with 59 additions and 0 deletions.
59 changes: 59 additions & 0 deletions diagram.md
@@ -0,0 +1,59 @@
# Data Pipeline Flow

<div align="center">

```mermaid
graph TD
subgraph AWS
A[AWS Data Source]
end
subgraph "Container: Extract"
B[extract.py]
end
subgraph "Container: Load"
C[load.py]
end
subgraph "Container: Transform"
D[transform.py]
end
subgraph "PostgreSQL Database"
E[Loading Table]
F[Final Table]
end
subgraph "Container: Visualize"
G[Flask App]
end
A -->|Data| B
B -->|Extracted Data| C
C -->|Load Data| E
E -->|Read Data| D
D -->|Transformed Data| F
F -->|Read Data| G
classDef container fill:#e6f3ff,stroke:#333,stroke-width:2px;
class B,C,D,G container;
classDef scriptText fill:#e6f3ff,stroke:#333,stroke-width:2px,color:black;
class B,C,D,G scriptText;
```

</div>

## Flow Explanation

The entire process is orchestrated using shell scripts (extract.sh, load.sh, transform.sh) which manage the execution of each step in the pipeline:

1. **Extract**: Data is sourced from AWS and extracted using `extract.py` in the Extract container.

2. **Load**: The extracted data is then loaded into the Loading Table of the PostgreSQL database using `load.py` in the Load container.

3. **Transform**: Data from the Loading Table is read, transformed using `transform.py` in the Transform container, and then stored in the Final Table.

4. **Visualize**: Finally, a Flask app in the Visualize container reads data from the Final Table to create visualizations or serve data via an API.

This pipeline ensures a structured flow of data from its source to a visualized or API-accessible format, with clear separation of concerns at each stage. The use of shell scripts for orchestration allows for flexible and controllable execution of the pipeline steps.

0 comments on commit 1b22c77

Please sign in to comment.