swf-testbed/docs/proposal_diagrams.md

0001 # AI-enabled WFMS proposal — draft diagrams
0002
0003 Mermaid prototypes to support the "Why us?" section. Finished versions will
0004 graduate to hand-authored SVG in `swf-testbed/docs/images/` style.
0005
0006 Open this file's preview (`Ctrl+Shift+V`) to render.
0007
0008 ---
0009
0010 ## Diagram 1 — Three Contexts
0011
0012 The thesis picture: three LLM-integrated systems running today, ordered
0013 left-to-right by increasing LLM autonomy. Shared MCP ecosystem feeds all
0014 three. Top banner carries the 6-month claim.
0015
0016 ```mermaid
0017 flowchart TB
0018     classDef llm fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000
0019     classDef human fill:#fff3e0,stroke:#e65100,color:#000
0020     classDef tool fill:#f1f8e9,stroke:#33691e,color:#000
0021     classDef delta fill:#fce4ec,stroke:#ad1457,stroke-dasharray:4 3,color:#000
0022
0023     Thesis["<b>Today</b>: LLMs inform humans &nbsp;&nbsp;━━ 6-month step ━━▶&nbsp;&nbsp; <b>Tomorrow</b>: LLMs act within workflows"]:::delta
0024
0025     subgraph P1["① Real-time bot — Mattermost"]
0026         direction TB
0027         U1["ePIC users"]:::human
0028         B1["AI bot<br/>Haiku · cross-session memory<br/>context harness"]:::llm
0029         O1["Q&A, diagnostics, on-the-fly analysis"]
0030         U1 --> B1 --> O1 --> U1
0031     end
0032
0033     subgraph P2["② Research orchestrator — corun-ai"]
0034         direction TB
0035         U2["Expert evaluators<br/>(production, user learning)"]:::human
0036         S2["Scheduler<br/>model × sysprompt × MCP set<br/>config compare & annotate"]
0037         B2["Long-latency worker<br/>Opus / Sonnet / Gemini / Gemma<br/>minutes–tens of minutes"]:::llm
0038         O2["Deep research entry<br/>(e.g. Perlmutter performance)"]
0039         U2 --> S2 --> B2 --> O2 --> U2
0040     end
0041
0042     subgraph P3["③ Active workflow orchestrator — swf-testbed"]
0043         direction TB
0044         U3["Testbed users"]:::human
0045         B3["LLM orchestrator<br/>launch · run · monitor<br/>assess · summarize"]:::llm
0046         W3["Hybrid workflow<br/>LLM steps ⇄ deterministic agents<br/>DAQ sim → PanDA workers"]
0047         O3["Completed run + summary"]
0048         U3 --> B3 --> W3 --> B3
0049         W3 --> O3 --> U3
0050     end
0051
0052     subgraph MCP["Shared MCP tool ecosystem"]
0053         direction LR
0054         IH["<b>In-house</b><br/>AskPanDA · PanDA Monitor · Streaming Workflow"]:::tool
0055         AD["<b>Adopted</b><br/>Rucio · XRootD · uproot · LXR · GitHub · Zenodo"]:::tool
0056     end
0057
0058     Thesis -.-> P1
0059     Thesis -.-> P2
0060     Thesis -.-> P3
0061     P1 --> MCP
0062     P2 --> MCP
0063     P3 --> MCP
0064 ```
0065
0066 Legend: blue = LLM, orange = human, green = MCP/tool surface, pink-dashed = thesis / 6-month delta.
0067
0068 ---
0069
0070 ## Diagram 3 — Hybrid Workflow Anatomy
0071
0072 One real swf-testbed streaming run as a pipeline of alternating LLM and
0073 deterministic steps, with the MCP tools each LLM step actually calls.
0074 Human-in-loop gate between ⑨ and ⑩ is where the 6-month scope lands.
0075
0076 ```mermaid
0077 flowchart TB
0078     classDef llm fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000
0079     classDef det fill:#eeeeee,stroke:#555,color:#000
0080     classDef hitl fill:#fff3e0,stroke:#e65100,stroke-dasharray:5 3,color:#000
0081     classDef mcp fill:#f1f8e9,stroke:#33691e,font-size:11px,color:#000
0082
0083     U["User prompt<br/>'run a fast-processing test<br/>and summarize results'"]:::hitl
0084
0085     L1["① Prepare<br/>select config, check prior runs"]:::llm
0086     L1t["swf_list_workflow_executions<br/>pcs_list_tags · swf_get_system_state"]:::mcp
0087
0088     L2["② Start testbed"]:::llm
0089     L2t["swf_start_user_testbed"]:::mcp
0090
0091     L3["③ Start workflow"]:::llm
0092     L3t["swf_start_workflow<br/>(stf_count, config, …)"]:::mcp
0093
0094     D1["④ DAQ simulator<br/>emits STF files"]:::det
0095     D2["⑤ Data agent<br/>STF registration"]:::det
0096     D3["⑥ FastMon agent<br/>samples Time Frames"]:::det
0097     D4["⑦ Fast processing agent<br/>TF slices → PanDA"]:::det
0098     D5["⑧ PanDA workers<br/>EICrecon reconstruction"]:::det
0099
0100     L4["⑨ Monitor in-flight<br/>errors, throughput, stragglers"]:::llm
0101     L4t["swf_list_logs(level='ERROR')<br/>swf_list_workflow_executions<br/>panda_get_activity"]:::mcp
0102
0103     G1{"human-in-loop<br/>gate<br/>(scope of 6-mo work)"}:::hitl
0104
0105     L5["⑩ Assess & summarize<br/>narrative run report,<br/>anomaly notes,<br/>comparison to prior runs"]:::llm
0106     L5t["swf_get_workflow_execution<br/>panda_study_job · lxr_ident"]:::mcp
0107
0108     O["Run entry + summary<br/>annotated, searchable"]:::hitl
0109
0110     U --> L1 --> L2 --> L3 --> D1 --> D2 --> D3 --> D4 --> D5 --> L4
0111     L4 --> G1 --> L5 --> O
0112
0113     L1 -.- L1t
0114     L2 -.- L2t
0115     L3 -.- L3t
0116     L4 -.- L4t
0117     L5 -.- L5t
0118 ```
0119
0120 Legend: blue = LLM step, grey = deterministic agent, dashed orange = human-in-loop / user edge, green captions = MCP tool calls.
0121
0122 ---
0123
0124 ## Diagram 2 — MCP Tool Ecosystem
0125
0126 One LLM reaches into the experiment's operational stack through a two-tier
0127 tool set. Counters "everyone has MCP now" by showing depth into production
0128 systems.
0129
0130 ```mermaid
0131 flowchart TB
0132     classDef llm fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000
0133     classDef ih fill:#f1f8e9,stroke:#33691e,color:#000
0134     classDef ad fill:#fff8e1,stroke:#f57f17,color:#000
0135     classDef sys fill:#fafafa,stroke:#888,color:#444
0136
0137     LLM["<b>LLM</b><br/>Opus · Sonnet · Haiku · Gemini · Gemma<br/>sysprompt · effort level · context harness"]:::llm
0138
0139     subgraph IH["In-house — purpose-built on our production WFMS"]
0140         direction LR
0141         T1["AskPanDA<br/><i>job diagnostics</i>"]:::ih
0142         T2["PanDA Monitor MCP<br/><i>operational state</i>"]:::ih
0143         T3["Streaming Workflow MCP<br/><i>active testbed control</i>"]:::ih
0144     end
0145
0146     subgraph AD["3rd-party MCP — 6+ community/standard tools"]
0147         direction LR
0148         T4["Rucio MCP"]:::ad
0149         T5["XRootD MCP"]:::ad
0150         T6["uproot MCP"]:::ad
0151         T7["LXR XREF MCP"]:::ad
0152         T8["GitHub MCP"]:::ad
0153         T9["Zenodo MCP"]:::ad
0154     end
0155
0156     subgraph SYS["Reaches into"]
0157         direction LR
0158         S1["PanDA DB<br/>monitor · testbed"]:::sys
0159         S2["Rucio<br/>data catalogs"]:::sys
0160         S3["XRootD<br/>remote I/O"]:::sys
0161         S4["ePIC codebase<br/>(55+ repos)"]:::sys
0162         S5["Zenodo<br/>official repo"]:::sys
0163     end
0164
0165     LLM --> IH
0166     LLM --> AD
0167     IH --> SYS
0168     AD --> SYS
0169 ```
0170
0171 ---
0172
0173 ## Diagram 4 — 6-month Delta (before / after)
0174
0175 Same boxes, one arrow moves, one audit loop added. Makes the project
0176 feel like a bounded increment on an operational system, not a research
0177 leap.
0178
0179 ```mermaid
0180 flowchart LR
0181     classDef llm fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000
0182     classDef human fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000
0183     classDef wfms fill:#f3e5f5,stroke:#6a1b9a,stroke-width:2px,color:#000
0184     classDef audit fill:#fffde7,stroke:#f9a825,stroke-dasharray:4 3,color:#000
0185     classDef delta fill:#fce4ec,stroke:#ad1457,color:#000
0186
0187     subgraph TODAY["<b>Today</b> — LLM informs, human decides"]
0188         direction TB
0189         T_LLM["LLM"]:::llm
0190         T_MCP["MCP tools<br/><i>reads · analyzes</i>"]:::llm
0191         T_HUM["<b>Human decides</b>"]:::human
0192         T_WFMS["WFMS acts"]:::wfms
0193         T_LLM --> T_MCP --> T_HUM --> T_WFMS
0194     end
0195
0196     subgraph NEXT["<b>Proposed (6 months)</b> — LLM decides on pre-defined classes"]
0197         direction TB
0198         N_LLM["LLM"]:::llm
0199         N_MCP["MCP tools<br/><i>reads · analyzes · <b>decides</b></i>"]:::llm
0200         N_WFMS["WFMS acts"]:::wfms
0201         N_AUD["HITL audit trail<br/><i>async human review</i>"]:::audit
0202         N_LLM --> N_MCP --> N_WFMS
0203         N_WFMS -.-> N_AUD
0204         N_AUD -.-> N_LLM
0205     end
0206
0207     DELTA["<b>Delta:</b><br/>• 'human decides' → 'LLM decides'<br/>• HITL audit loop added<br/>• Scope gated by decision-class allowlist"]:::delta
0208
0209     TODAY -.-> DELTA -.-> NEXT
0210 ```
0211
0212 ---
0213
0214 ## Diagram 5 — PanDA Scale Provenance
0215
0216 The "why 6 months is plausible" anchor: we're layering on a production
0217 WFMS with a decade of operational history, not starting from zero.
0218
0219 ```mermaid
0220 flowchart BT
0221     classDef app fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000
0222     classDef ai fill:#f1f8e9,stroke:#33691e,stroke-width:2px,color:#000
0223     classDef panda fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000
0224
0225     subgraph L1["<b>Foundation</b> — PanDA production WFMS (operational since 2005)"]
0226         direction LR
0227         P1["ATLAS @ LHC<br/>O(million) jobs/day<br/>200+ institutions"]:::panda
0228         P2["PanDA monitor<br/>deep drill-down<br/>refined 15+ years"]:::panda
0229         P3["ePIC production<br/>(monthly campaigns, OSG, HPC)"]:::panda
0230         P4["ePIC streaming<br/>workflow testbed<br/>(this team, 2025+)"]:::panda
0231     end
0232
0233     subgraph L2["AI instrumentation today — this team (2024–)"]
0234         direction LR
0235         I1["AskPanDA MCP"]:::ai
0236         I2["PanDA Monitor MCP"]:::ai
0237         I3["VectorDB RAG"]:::ai
0238         I4["Streaming Workflow MCP"]:::ai
0239         I5["3rd-party MCP (6+)"]:::ai
0240     end
0241
0242     subgraph L3["<b>New application layer</b> — LLM-driven orchestration (proposed, 6 months)"]
0243         direction LR
0244         A1["LLM workflow<br/>orchestrator"]:::app
0245         A2["Hybrid workflows<br/>LLM + deterministic"]:::app
0246         A3["Harnessed autonomous<br/>LLM action"]:::app
0247         A4["LLM research assistant<br/><i>evolution of Mattermost<br/>bot + codoc-ai</i>"]:::app
0248     end
0249
0250     L1 --> L2
0251     L2 --> L3
0252 ```
0253
0254 ---
0255
0256 ## Diagram 6 — corun-ai Research Loop
0257
0258 Shows corun-ai as an orchestrated research system, not a chatbot.
0259 Config-compare in annotation threads is the R&D-testbed feature.
0260
0261 ```mermaid
0262 sequenceDiagram
0263     autonumber
0264     participant U as Expert evaluator
0265     participant S as Scheduler
0266     participant W as Worker LLM
0267     participant M as MCP tools
0268     participant E as Research entry
0269
0270     U->>S: submit research prompt<br/>+ config (model · sysprompt · MCP set)
0271     S->>W: spawn worker with config
0272     loop deep analysis — minutes to tens of minutes
0273         W->>M: tool call (PanDA / LXR / Rucio / ...)
0274         M-->>W: results
0275         W->>W: reason · refine · iterate
0276     end
0277     W-->>S: completed analysis
0278     S->>E: write research entry
0279     E-->>U: notify + surface result
0280     U->>E: annotate · thread comments
0281     Note over U,E: config variants compared<br/>side-by-side in threads —<br/>an R&D testbed, not a product
0282 ```
0283
0284 ---
0285
0286 Fill in concrete numbers (PanDA jobs/day, testbed run count, corun-ai prompt count, etc.) before these go into proposal figures.