From 1b22c7778720135449127a98397bdf2ceda1347c Mon Sep 17 00:00:00 2001
From: tmanik <tmanik@internet2.edu>
Date: Tue, 17 Sep 2024 10:39:20 -0400
Subject: [PATCH] Added diagram

---
 diagram.md | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100644 diagram.md

diff --git a/diagram.md b/diagram.md
new file mode 100644
index 0000000..dc5fcee
--- /dev/null
+++ b/diagram.md
@@ -0,0 +1,59 @@
+# Data Pipeline Flow
+
+<div align="center">
+
+```mermaid
+graph TD
+    subgraph AWS
+        A[AWS Data Source]
+    end
+
+    subgraph "Container: Extract"
+        B[extract.py]
+    end
+
+    subgraph "Container: Load"
+        C[load.py]
+    end
+
+    subgraph "Container: Transform"
+        D[transform.py]
+    end
+
+    subgraph "PostgreSQL Database"
+        E[Loading Table]
+        F[Final Table]
+    end
+
+    subgraph "Container: Visualize"
+        G[Flask App]
+    end
+
+    A -->|Data| B
+    B -->|Extracted Data| C
+    C -->|Load Data| E
+    E -->|Read Data| D
+    D -->|Transformed Data| F
+    F -->|Read Data| G
+
+    classDef container fill:#e6f3ff,stroke:#333,stroke-width:2px;
+    class B,C,D,G container;
+    classDef scriptText fill:#e6f3ff,stroke:#333,stroke-width:2px,color:black;
+    class B,C,D,G scriptText;
+```
+
+</div>
+
+## Flow Explanation
+
+The entire process is orchestrated using shell scripts (extract.sh, load.sh, transform.sh) which manage the execution of each step in the pipeline:
+
+1. **Extract**: Data is sourced from AWS and extracted using `extract.py` in the Extract container.
+
+2. **Load**: The extracted data is then loaded into the Loading Table of the PostgreSQL database using `load.py` in the Load container.
+
+3. **Transform**: Data from the Loading Table is read, transformed using `transform.py` in the Transform container, and then stored in the Final Table.
+
+4. **Visualize**: Finally, a Flask app in the Visualize container reads data from the Final Table to create visualizations or serve data via an API.
+
+This pipeline ensures a structured flow of data from its source to a visualized or API-accessible format, with clear separation of concerns at each stage. The use of shell scripts for orchestration allows for flexible and controllable execution of the pipeline steps.
\ No newline at end of file