Warning, /swf-monitor/docs/MCP.md is written in an unsupported language. File is not indexed.
0001 # Model Context Protocol (MCP) Integration
0002
0003 ## Overview
0004
0005 The SWF Monitor implements the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP), the open standard for LLM-system interaction. This enables natural language queries and control of the testbed via MCP-compatible LLMs.
0006
0007 **Endpoint:** `/swf-monitor/mcp/`
0008
0009 **Package:** [django-mcp-server](https://github.com/omarbenhamid/django-mcp-server)
0010
0011 ## Design Philosophy
0012
0013 MCP tools are **data access primitives** with filtering capabilities. The LLM synthesizes, summarizes, and aggregates information from multiple tool calls. This approach:
0014
0015 - Provides flexibility for unanticipated queries
0016 - Leverages LLM reasoning capabilities
0017 - Keeps tools simple and composable
0018 - Supports complex analysis through multiple calls
0019
0020 **Date Range Convention:** All list tools support `start_time` and `end_time` parameters (ISO datetime strings). If omitted, tools default to a reasonable recent period.
0021
0022 **Pagination Metadata:** All list tools return pagination metadata for LLM context management:
0023 - `items`: The returned records (limited to MAX_ITEMS per tool)
0024 - `total_count`: Total number of matching records in the database
0025 - `has_more`: Boolean indicating if there are more records beyond what was returned
0026 - `monitor_urls`: Links to the web UI for human review
0027
0028 This helps LLMs understand when query results are truncated and whether to refine filters for better results.
0029
0030 ## Client Configuration
0031
0032 ### Claude Desktop
0033
0034 Add to `claude_desktop_config.json`:
0035
0036 ```json
0037 {
0038 "mcpServers": {
0039 "swf-monitor": {
0040 "url": "https://pandaserver02.sdcc.bnl.gov/swf-monitor/mcp/",
0041 "transport": "http"
0042 }
0043 }
0044 }
0045 ```
0046
0047 ### Claude Code
0048
0049 Add via `/mcp add` or create `.mcp.json` in project:
0050
0051 ```json
0052 {
0053 "mcpServers": {
0054 "swf-monitor": {
0055 "type": "http",
0056 "url": "https://pandaserver02.sdcc.bnl.gov/swf-monitor/mcp/"
0057 }
0058 }
0059 }
0060 ```
0061
0062 ## Authentication
0063
0064 The MCP endpoint supports two authentication modes:
0065
0066 ### Claude Code (Local)
0067
0068 POST requests pass through without authentication. This enables local clients (Claude Code, the PanDA Mattermost bot, scripts) to access MCP without OAuth setup.
0069
0070 ### Claude.ai (Remote)
0071
0072 GET requests require OAuth 2.1 Bearer token authentication via Auth0. This enables Claude.ai remote MCP connections with proper authorization.
0073
0074 **OAuth Flow:**
0075 1. Claude.ai discovers OAuth metadata via `/.well-known/oauth-protected-resource`
0076 2. User authenticates with Auth0
0077 3. Claude.ai includes Bearer token in requests
0078 4. MCP middleware validates JWT against Auth0 JWKS
0079
0080 **Configuration (production):**
0081 ```bash
0082 # In .env or environment
0083 AUTH0_DOMAIN=your-tenant.us.auth0.com
0084 AUTH0_CLIENT_ID=your-client-id
0085 AUTH0_CLIENT_SECRET=your-client-secret
0086 AUTH0_API_IDENTIFIER=https://your-server/swf-monitor/mcp
0087 ```
0088
0089 Leave `AUTH0_DOMAIN` empty to disable OAuth (allows all requests through).
0090
0091 **Network Requirements:**
0092 Claude.ai connects from Anthropic's servers, so the MCP endpoint must be accessible from the public internet. Internal networks (e.g., behind lab firewalls) may require network configuration to allow external access.
0093
0094 ---
0095
0096 ### Claude Code Settings Example
0097
0098 Full `~/.claude/settings.json` with swf-monitor MCP server, permissions, and status line:
0099
0100 ```json
0101 {
0102 "mcpServers": {
0103 "swf-monitor": {
0104 "type": "http",
0105 "url": "https://pandaserver02.sdcc.bnl.gov/swf-monitor/mcp/"
0106 }
0107 },
0108 "statusLine": {
0109 "type": "command",
0110 "command": "~/.claude/statusline.sh"
0111 },
0112 "permissions": {
0113 "allow": [
0114 "Bash(ls:*)",
0115 "Bash(wc:*)",
0116 "Bash(grep:*)",
0117 "mcp__swf-monitor__get_server_instructions",
0118 "mcp__swf-monitor__swf_list_agents",
0119 "mcp__swf-monitor__swf_get_agent",
0120 "mcp__swf-monitor__swf_list_workflow_executions",
0121 "mcp__swf-monitor__swf_get_workflow_execution",
0122 "mcp__swf-monitor__swf_list_logs",
0123 "mcp__swf-monitor__swf_get_system_state",
0124 "WebSearch",
0125 "WebFetch"
0126 ],
0127 "defaultMode": "default"
0128 },
0129 "alwaysThinkingEnabled": true
0130 }
0131 ```
0132
0133 **Status line script** (`~/.claude/statusline.sh`):
0134
0135 ```bash
0136 #!/bin/bash
0137 input=$(cat)
0138 MODEL=$(echo "$input" | jq -r '.model.display_name')
0139 USED=$(echo "$input" | jq -r '.context_window.used_percentage // 0')
0140 REMAINING=$(echo "$input" | jq -r '.context_window.remaining_percentage // 100')
0141 echo "[$MODEL] ${USED}% used | ${REMAINING}% remaining"
0142 ```
0143
0144 ---
0145
0146 ## Available Tools
0147
0148 ### Tool Discovery
0149
0150 | Tool | Parameters | Description |
0151 |------|------------|-------------|
0152 | `swf_list_available_tools` | - | List all available MCP tools with descriptions. Use to discover capabilities. |
0153
0154 ---
0155
0156 ### System State
0157
0158 | Tool | Parameters | Description |
0159 |------|------------|-------------|
0160 | `swf_get_system_state` | `username` | Comprehensive system state for a user: context from testbed.toml, agent manager status, workflow runner readiness, agent counts, execution stats. |
0161
0162 **Parameters:**
0163 - `username`: Optional. Username to get context for (reads their testbed.toml). If not provided, infers from SWF_HOME environment variable.
0164
0165 **Returns:**
0166 - `timestamp`: Current server time
0167 - `user_context`: namespace, workflow defaults from user's testbed.toml
0168 - `agent_manager`: Status of user's agent manager daemon (healthy/unhealthy/missing/exited)
0169 - `workflow_runner`: Status of healthy DAQ_Simulator that can accept swf_start_workflow
0170 - `ready_to_run`: Boolean - True if workflow_runner is healthy and can accept commands
0171 - `last_execution`: Most recent workflow execution for user's namespace
0172 - `errors_last_hour`: Count of ERROR logs in user's namespace
0173 - `agents`: Total, active, exited, healthy, unhealthy counts
0174 - `executions`: Running count, completed in last hour
0175 - `messages_last_10min`: Recent message count
0176 - `run_states`: Current fast processing run states
0177 - `persistent_state`: System-wide persistent state (next IDs, etc.)
0178 - `recent_events`: Last 10 system state events
0179
0180 ---
0181
0182 ### Agents
0183
0184 | Tool | Parameters | Description |
0185 |------|------------|-------------|
0186 | `swf_list_agents` | `namespace`, `agent_type`, `status`, `execution_id`, `start_time`, `end_time` | List agents with filtering. **Excludes EXITED agents by default.** |
0187 | `swf_get_agent` | `name` (required) | Full details for a specific agent including metadata. |
0188
0189 **`swf_list_agents` filters:**
0190 - `namespace`: Filter to agents in this namespace
0191 - `agent_type`: Filter by type (daqsim, data, processing, fastmon, workflow_runner, etc.)
0192 - `status`: Filter by status. Special values:
0193 - `None` (default): Excludes EXITED agents
0194 - `'EXITED'`: Show only exited agents
0195 - `'all'`: Show all agents regardless of status
0196 - `'OK'`, `'WARNING'`, `'ERROR'`: Filter to specific status
0197 - `execution_id`: Filter to agents that participated in this execution
0198 - `start_time`, `end_time`: Filter by heartbeat within date range
0199
0200 **Returns per agent:**
0201 - `name`, `agent_type`, `status`, `operational_state`, `namespace`
0202 - `last_heartbeat` (ISO timestamp)
0203 - `workflow_enabled`, `total_stf_processed`
0204
0205 ---
0206
0207 ### Namespaces
0208
0209 | Tool | Parameters | Description |
0210 |------|------------|-------------|
0211 | `swf_list_namespaces` | - | List all testbed namespaces with owners. |
0212 | `swf_get_namespace` | `namespace` (required), `start_time`, `end_time` | Details for a namespace including activity counts. |
0213
0214 **`swf_get_namespace` returns:**
0215 - `name`, `owner`, `description`
0216 - `agent_count`: Agents registered in namespace
0217 - `execution_count`: Workflow executions (in date range if specified)
0218 - `message_count`: Messages (in date range if specified)
0219 - `active_users`: Users who ran executions (in date range if specified)
0220
0221 ---
0222
0223 ### Workflow Definitions
0224
0225 | Tool | Parameters | Description |
0226 |------|------------|-------------|
0227 | `swf_list_workflow_definitions` | `workflow_type`, `created_by` | List available workflow definitions. |
0228
0229 **Returns per definition:**
0230 - `workflow_name`, `version`, `workflow_type`
0231 - `description`, `created_by`, `created_at`
0232 - `execution_count`: Number of times executed
0233
0234 ---
0235
0236 ### Workflow Executions
0237
0238 | Tool | Parameters | Description |
0239 |------|------------|-------------|
0240 | `swf_list_workflow_executions` | `namespace`, `status`, `executed_by`, `workflow_name`, `currently_running`, `start_time`, `end_time` | List workflow executions with filtering. |
0241 | `swf_get_workflow_execution` | `execution_id` (required) | Full details for a specific execution. |
0242
0243 **`swf_list_workflow_executions` filters:**
0244 - `namespace`: Filter to executions in this namespace
0245 - `status`: Filter by status (pending, running, completed, failed, terminated)
0246 - `executed_by`: Filter by user who started the execution
0247 - `workflow_name`: Filter by workflow definition name
0248 - `currently_running`: If True, return all running executions (ignores date range). Use for "What's running?"
0249 - `start_time`, `end_time`: Filter by execution start time
0250
0251 **Returns per execution:**
0252 - `execution_id`, `workflow_name`, `namespace`
0253 - `status`, `executed_by`
0254 - `start_time`, `end_time` (ISO timestamps)
0255 - `parameter_values`: Execution configuration
0256
0257 ---
0258
0259 ### Messages
0260
0261 | Tool | Parameters | Description |
0262 |------|------------|-------------|
0263 | `swf_list_messages` | `namespace`, `execution_id`, `agent`, `message_type`, `start_time`, `end_time` | List workflow messages with filtering. |
0264 | `swf_send_message` | `message` (required), `message_type`, `metadata` | Send a message to the monitoring stream. |
0265
0266 **Diagnostic use cases:**
0267 - Track workflow progress: `swf_list_messages(execution_id='stf_datataking-user-0044')`
0268 - See what an agent sent: `swf_list_messages(agent='daq_simulator-agent-user-123')`
0269 - Debug message flow: `swf_list_messages(namespace='torre1', start_time='2026-01-13T11:00:00')`
0270 - For workflow failures: use `swf_list_logs(level='ERROR')` instead
0271
0272 **Common message types:** `run_imminent`, `start_run`, `stf_gen`, `end_run`, `pause_run`, `resume_run`
0273
0274 **Filters:**
0275 - `namespace`: Filter to messages in this namespace
0276 - `execution_id`: Filter to messages for this execution
0277 - `agent`: Filter by sender agent name
0278 - `message_type`: Filter by type (stf_gen, start_run, etc.)
0279 - `start_time`, `end_time`: Filter by sent time (default: last 1 hour)
0280
0281 **Returns per message (max 200):**
0282 - `message_type`, `sender_agent`, `namespace`
0283 - `sent_at` (ISO timestamp)
0284 - `execution_id`, `run_id`
0285 - `payload_summary`: Truncated message content
0286
0287 **`swf_send_message` parameters:**
0288 - `message` (required): The message text to send
0289 - `message_type`: Type of message (default: 'announcement')
0290 - `'test'`: Namespace is omitted (for pipeline testing)
0291 - `'announcement'`, `'status'`, etc.: Uses configured namespace from testbed.toml
0292 - `metadata`: Optional dict of additional key-value data
0293
0294 **`swf_send_message` behavior:**
0295 - Sender is automatically identified as `{username}-personal-agent`
0296 - Messages are sent to `/topic/epictopic` and captured by the monitor
0297 - Use for: testing the message pipeline, announcements to colleagues, or any broadcast purpose
0298
0299 **Returns:**
0300 - `success`: Whether the message was sent
0301 - `sender`: The sender identifier (e.g., 'wenauseic-personal-agent')
0302 - `message_type`: The type of message sent
0303 - `namespace`: The namespace used (or null for test messages)
0304 - `content`: The message content
0305
0306 ---
0307
0308 ### Runs
0309
0310 | Tool | Parameters | Description |
0311 |------|------------|-------------|
0312 | `swf_list_runs` | `start_time`, `end_time` | List simulation runs with timing and file counts. |
0313 | `swf_get_run` | `run_number` (required) | Full details for a specific run. |
0314
0315 **`swf_list_runs` returns per run:**
0316 - `run_number`
0317 - `start_time`, `end_time`, `duration_seconds`
0318 - `stf_file_count`: Number of STF files in this run
0319
0320 **`swf_get_run` returns:**
0321 - All fields above plus:
0322 - `run_conditions`: JSON metadata
0323 - `file_stats`: STF file counts by status (registered, processing, done, failed)
0324
0325 ---
0326
0327 ### STF Files
0328
0329 | Tool | Parameters | Description |
0330 |------|------------|-------------|
0331 | `swf_list_stf_files` | `run_number`, `status`, `machine_state`, `start_time`, `end_time` | List STF files with filtering. |
0332 | `swf_get_stf_file` | `file_id` or `stf_filename` (one required) | Full details for a specific STF file. |
0333
0334 **`swf_list_stf_files` filters:**
0335 - `run_number`: Filter to files from this run
0336 - `status`: Filter by processing status (registered, processing, processed, done, failed)
0337 - `machine_state`: Filter by detector state (physics, cosmics, etc.)
0338 - `start_time`, `end_time`: Filter by creation time
0339
0340 **Returns per STF file:**
0341 - `file_id`, `stf_filename`, `run_number`
0342 - `status`, `machine_state`
0343 - `file_size_bytes`, `created_at`
0344 - `tf_file_count`: Number of TF files derived from this STF
0345
0346 **`swf_get_stf_file` returns:**
0347 - All fields above plus:
0348 - `checksum`, `metadata`
0349 - `workflow_id`, `daq_state`, `daq_substate`, `workflow_status`
0350
0351 ---
0352
0353 ### TF Slices (Fast Processing)
0354
0355 | Tool | Parameters | Description |
0356 |------|------------|-------------|
0357 | `swf_list_tf_slices` | `run_number`, `stf_filename`, `tf_filename`, `status`, `assigned_worker`, `start_time`, `end_time` | List TF slices for fast processing workflow. |
0358 | `swf_get_tf_slice` | `tf_filename`, `slice_id` (both required) | Full details for a specific TF slice. |
0359
0360 **`swf_list_tf_slices` filters:**
0361 - `run_number`: Filter to slices from this run
0362 - `stf_filename`: Filter to slices from this STF file
0363 - `tf_filename`: Filter to slices from this TF sample
0364 - `status`: Filter by status (queued, processing, completed, failed)
0365 - `assigned_worker`: Filter by worker assignment
0366 - `start_time`, `end_time`: Filter by creation time
0367
0368 **Returns per slice (max 200):**
0369 - `slice_id`, `tf_filename`, `stf_filename`, `run_number`
0370 - `tf_first`, `tf_last`, `tf_count` (TF range)
0371 - `status`, `assigned_worker`
0372 - `created_at`, `completed_at`
0373
0374 **`swf_get_tf_slice` returns:**
0375 - All fields above plus:
0376 - `retries`, `assigned_at`
0377 - `metadata`
0378
0379 ---
0380
0381 ### Logs
0382
0383 | Tool | Parameters | Description |
0384 |------|------------|-------------|
0385 | `swf_list_logs` | `app_name`, `instance_name`, `execution_id`, `level`, `search`, `start_time`, `end_time` | List log entries from all agents. |
0386 | `swf_get_log_entry` | `log_id` (required) | Full details for a specific log entry. |
0387
0388 **Diagnostic use cases:**
0389 - Workflow logs: `swf_list_logs(execution_id='stf_datataking-user-0044')`
0390 - Debug a specific agent: `swf_list_logs(instance_name='daq_simulator-agent-user-123')`
0391 - Find all errors: `swf_list_logs(level='ERROR')`
0392 - Search for specific issues: `swf_list_logs(search='connection failed')`
0393
0394 **`swf_list_logs` filters:**
0395 - `app_name`: Filter by application type (e.g., 'daq_simulator', 'data_agent')
0396 - `instance_name`: Filter by agent instance name
0397 - `execution_id`: Filter by workflow execution ID (e.g., 'stf_datataking-wenauseic-0044')
0398 - `level`: Minimum level threshold - returns this level and higher severity:
0399 - `DEBUG` -> all logs
0400 - `INFO` -> INFO, WARNING, ERROR, CRITICAL
0401 - `WARNING` -> WARNING, ERROR, CRITICAL
0402 - `ERROR` -> ERROR, CRITICAL
0403 - `CRITICAL` -> CRITICAL only
0404 - `search`: Case-insensitive text search in message
0405 - `start_time`, `end_time`: Filter by timestamp (default: last 24 hours)
0406
0407 **Returns per entry (max 200):**
0408 - `id`, `timestamp`, `app_name`, `instance_name`
0409 - `level`, `message`, `module`, `funcname`, `lineno`
0410 - `extra_data`: Additional context (execution_id, run_id, etc.)
0411
0412 ---
0413
0414 ### Workflow Control
0415
0416 | Tool | Parameters | Description |
0417 |------|------------|-------------|
0418 | `swf_start_workflow` | `workflow_name`, `namespace`, `config`, `realtime`, `duration`, `stf_count`, `physics_period_count`, `physics_period_duration`, `stf_interval` | Start a workflow by sending command to DAQ Simulator agent. |
0419 | `swf_stop_workflow` | `execution_id` (required) | Stop a running workflow gracefully. |
0420 | `swf_end_execution` | `execution_id` (required) | Mark a stuck execution as terminated (database state change only). |
0421
0422 **`swf_start_workflow` parameters:**
0423
0424 All parameters are optional - defaults are read from the user's `testbed.toml`:
0425 - `workflow_name`: Name of workflow (default: from config, typically 'stf_datataking')
0426 - `namespace`: Testbed namespace (default: from config)
0427 - `config`: Workflow config name (default: from config, e.g., 'fast_processing_default')
0428 - `realtime`: Run in real-time mode (default: from config, typically True)
0429 - `duration`: Max duration in seconds (0 = run until complete)
0430 - `stf_count`: Number of STF files to generate (overrides config)
0431 - `physics_period_count`: Number of physics periods (overrides config)
0432 - `physics_period_duration`: Duration of each physics period in seconds (overrides config)
0433 - `stf_interval`: Interval between STF generation in seconds (overrides config)
0434
0435 **Returns:** Success/failure status with execution details. Workflow runs asynchronously.
0436
0437 **After starting — ACTIVELY POLL, do not sleep:**
0438 - Poll `swf_get_workflow_monitor(execution_id)` every 10-15s until completion
0439 - Report progress to user as it evolves
0440 - Check `swf_list_logs(level='ERROR')` after completion
0441
0442 **`swf_stop_workflow`:** Sends a stop command to the DAQ Simulator agent. The workflow stops gracefully at the next checkpoint. Use `swf_list_workflow_executions(currently_running=True)` to find running execution IDs.
0443
0444 **`swf_end_execution`:** Use to clean up stale or stuck executions that are still marked as 'running' in the database. This is a state change only - no agent message is sent.
0445
0446 ---
0447
0448 ### Agent Process Management
0449
0450 | Tool | Parameters | Description |
0451 |------|------------|-------------|
0452 | `swf_kill_agent` | `name` (required) | Kill an agent process by sending SIGKILL to its PID. |
0453
0454 **`swf_kill_agent` behavior:**
0455 - Looks up the agent by `instance_name`
0456 - Retrieves its `pid` and `hostname`
0457 - Sends SIGKILL if the agent is on the current host
0458 - Always marks the agent's status and operational_state as `EXITED`
0459 - Agent will no longer appear in default `swf_list_agents` results
0460
0461 **Returns:**
0462 - `success`: Whether the operation completed
0463 - `killed`: Whether the process was actually killed (may be False if already dead or on different host)
0464 - `kill_error`: Error message if kill failed (permission denied, process not found, remote host)
0465 - `old_state`, `new_state`: State transition
0466
0467 ---
0468
0469 ### User Agent Manager
0470
0471 The User Agent Manager is a per-user daemon that enables MCP-driven testbed control. It listens for commands on a user-specific queue and manages supervisord-controlled agents.
0472
0473 | Tool | Parameters | Description |
0474 |------|------------|-------------|
0475 | `swf_check_agent_manager` | `username` | Check if a user's agent manager daemon is alive. |
0476 | `swf_get_testbed_status` | `username` | Comprehensive testbed status: agent manager, agents, running workflows, readiness. |
0477 | `swf_start_user_testbed` | `username`, `config_name` | Start a user's testbed via their agent manager. |
0478 | `swf_stop_user_testbed` | `username` | Stop a user's testbed via their agent manager. |
0479
0480 **`swf_check_agent_manager` returns:**
0481 - `alive`: True if agent manager has recent heartbeat (within 5 minutes)
0482 - `username`: The user being checked
0483 - `instance_name`: The agent manager's instance name (e.g., 'agent-manager-wenauseic')
0484 - `last_heartbeat`: When it last checked in
0485 - `operational_state`: Current state (READY, EXITED, etc.)
0486 - `control_queue`: The queue to send commands to (e.g., '/queue/agent_control.wenauseic')
0487 - `agents_running`: Whether testbed agents are currently running
0488 - `how_to_start`: Instructions if not alive
0489
0490 **`swf_get_testbed_status` returns:**
0491 - `agent_manager`: alive, namespace, operational_state, status, last_heartbeat
0492 - `agents`: List of workflow agents with running/stopped status
0493 - `summary`: Running and stopped agent counts
0494 - `running_workflows`: Count of currently executing workflows
0495 - `ready`: True when agent manager alive, agents running, and no workflow executing
0496 - `note`: Human-readable status summary
0497
0498 **`swf_start_user_testbed`:**
0499 - Sends `start_testbed` command to the user's agent manager
0500 - Agent manager must be running first (use `swf_check_agent_manager` to verify)
0501 - `config_name`: Optional config name (e.g., 'fast_processing'). Uses default if not specified.
0502 - Agents start asynchronously - use `swf_list_agents` to verify
0503
0504 **`swf_stop_user_testbed`:**
0505 - Sends `stop_testbed` command to the user's agent manager
0506 - If agent manager is not running, use `swf_kill_agent` to stop agents directly
0507
0508 **Starting the agent manager:**
0509 ```bash
0510 cd /data/<username>/github/swf-testbed
0511 source .venv/bin/activate && source ~/.env
0512 testbed agent-manager
0513 ```
0514
0515 ---
0516
0517 ### Workflow Monitoring
0518
0519 | Tool | Parameters | Description |
0520 |------|------------|-------------|
0521 | `swf_get_workflow_monitor` | `execution_id` (required) | Get aggregated status and events for a workflow execution. |
0522 | `swf_list_workflow_monitors` | - | List recent executions that can be monitored. |
0523
0524 **`swf_get_workflow_monitor` returns:**
0525 - `execution_id`: The execution being monitored
0526 - `status`: Current workflow status (running/completed/failed/terminated)
0527 - `phase`: Current phase (imminent/running/ended/unknown)
0528 - `run_id`: The run number for this execution
0529 - `stf_count`: Number of STF files generated
0530 - `events`: List of key events with timestamps (run_imminent, start_run, end_run)
0531 - `errors`: List of any errors encountered (from messages and logs)
0532 - `start_time`, `end_time`: Execution timestamps
0533 - `duration_seconds`: How long the workflow ran (if completed)
0534
0535 This tool aggregates information from workflow messages and logs, providing a single-call summary of workflow progress without needing to poll multiple tools.
0536
0537 **`swf_list_workflow_monitors` returns:**
0538 - List of executions from last 24 hours with: `execution_id`, `status`, `start_time`, `end_time`, `stf_count`
0539 - Use to pick an execution for detailed monitoring with `swf_get_workflow_monitor`
0540
0541 ---
0542
0543 ### PanDA Production Monitoring
0544
0545 Tools for querying the ePIC PanDA production database (`doma_panda` schema). Read-only access to jobs and JEDI tasks.
0546
0547 | Tool | Parameters | Description |
0548 |------|------------|-------------|
0549 | `panda_get_activity` | `days`, `username`, `site`, `workinggroup` | Pre-digested PanDA activity overview — aggregate counts only, no individual records. Use first for "What is PanDA doing?" |
0550 | `panda_list_jobs` | `days`, `status`, `username`, `site`, `taskid`, `reqid`, `limit`, `before_id` | List PanDA jobs with summary stats (default 200 jobs, 14 fields). Cursor-based pagination via before_id. |
0551 | `panda_diagnose_jobs` | `days`, `username`, `site`, `taskid`, `reqid`, `error_component`, `limit`, `before_id` | Diagnose failed/faulty PanDA jobs with full error details (7 error components). Cursor-based pagination via before_id. |
0552 | `panda_list_tasks` | `days`, `status`, `username`, `taskname`, `reqid`, `workinggroup`, `taskid`, `processingtype`, `limit`, `before_id` | List JEDI tasks with summary stats (default 25 tasks). Cursor-based pagination via before_id. |
0553 | `panda_error_summary` | `days`, `username`, `site`, `taskid`, `error_source`, `limit` | Aggregate error summary across failed jobs, ranked by frequency. |
0554 | `panda_study_job` | `pandaid` | Deep study of a single job — full record, files, errors, log URLs, harvester info, parent task. |
0555 | `panda_list_queues` | `cloud`, `site`, `resource_type`, `status`, `limit` | List EIC PanDA queues from live schedconfig — site, status, corecount, resource type, capability flags. |
0556 | `panda_get_queue` | `panda_queue` (required) | Full detail for a single PanDA queue. |
0557 | `panda_resource_usage` | `days`, `username`, `site`, `workinggroup` | Allocated vs used core-hours by queue/resource, rolled up for the time window. |
0558 | `panda_harvester_workers` | `status`, `site`, `resourcetype`, `days` | Live Harvester pilot/worker counts (via bamboo `askpanda_atlas`) — totals + breakdown by status, site, and resourcetype. |
0559
0560 **`panda_get_activity`** — Pre-digested overview, no individual records:
0561 - `days`: Time window in days (default 1)
0562 - `username`: Filter by job owner / task owner (supports SQL LIKE with %)
0563 - `site`: Filter by computing site (supports SQL LIKE with %)
0564 - `workinggroup`: Filter tasks by working group (e.g. 'EIC')
0565
0566 Returns:
0567 - `jobs`: `{total, by_status, by_user, by_site}` — each with status breakdown
0568 - `tasks`: `{total, by_status, by_user}` — each with status breakdown
0569
0570 Use cases:
0571 - What's PanDA doing right now? `panda_get_activity()`
0572 - EIC activity this week? `panda_get_activity(days=7, workinggroup='EIC')`
0573 - Activity for a user? `panda_get_activity(username='Dmitrii Kalinkin')`
0574
0575 **`panda_list_tasks` filters:**
0576 - `days`: Time window in days (default 7)
0577 - `status`: Task status (done, failed, running, ready, broken, aborted, pending, finished)
0578 - `username`: Task owner (supports SQL LIKE with %)
0579 - `taskname`: Task name (supports SQL LIKE with %)
0580 - `reqid`: Request ID
0581 - `workinggroup`: Experiment affiliation (e.g. 'EIC', 'Rubin'). NULL for iDDS automation tasks.
0582 - `processingtype`: Processing type (e.g. 'epicproduction'). Supports SQL LIKE with %.
0583 - `taskid`: Specific JEDI task ID
0584 - `limit`: Max tasks to return (default 25)
0585 - `before_id`: Pagination cursor
0586
0587 **Returns per task:**
0588 - `jeditaskid`, `taskname`, `status`, `username`
0589 - `creationdate`, `starttime`, `endtime`, `modificationtime`
0590 - `reqid`, `processingtype`, `transpath`
0591 - `progress`, `failurerate`, `errordialog`
0592 - `site`, `corecount`, `taskpriority`, `currentpriority`
0593 - `gshare`, `attemptnr`, `parent_tid`, `workinggroup`
0594
0595 **Diagnostic use cases:**
0596 - Task overview: `panda_list_tasks(days=7)`
0597 - Failed tasks: `panda_list_tasks(status='failed')`
0598 - Tasks for a user: `panda_list_tasks(username='Dmitrii Kalinkin')`
0599 - EIC experiment tasks: `panda_list_tasks(workinggroup='EIC')`
0600 - Search by name pattern: `panda_list_tasks(taskname='%workflow%')`
0601
0602 **`panda_error_summary` filters:**
0603 - `days`: Time window in days (default 10)
0604 - `username`: Filter by job owner (supports SQL LIKE with %)
0605 - `site`: Filter by computing site (supports SQL LIKE with %)
0606 - `taskid`: Filter by JEDI task ID
0607 - `error_source`: Filter to one component (pilot, executor, ddm, brokerage, dispatcher, supervisor, taskbuffer)
0608 - `limit`: Max error patterns to return (default 20)
0609
0610 **Returns per error pattern:**
0611 - `error_source`: Component name (pilot, executor, ddm, etc.)
0612 - `error_code`: Numeric error code
0613 - `error_diag`: Diagnostic message (truncated to 256 chars)
0614 - `count`: Number of affected jobs
0615 - `task_count`: Number of affected tasks
0616 - `users`: List of affected users
0617 - `sites`: List of affected sites
0618
0619 **Diagnostic use cases:**
0620 - Top errors this week: `panda_error_summary(days=7)`
0621 - Errors for a specific user: `panda_error_summary(username='Dmitrii Kalinkin')`
0622 - Pilot errors only: `panda_error_summary(error_source='pilot')`
0623 - Errors for a specific task: `panda_error_summary(taskid=33824)`
0624
0625 **`panda_study_job`** — Deep study of a single job:
0626 - `pandaid`: PanDA job ID (required)
0627
0628 Returns:
0629 - `job`: Full record (~40 fields, nulls stripped) with structured `errors` list
0630 - `files`: All associated files from `filestable4` (log, output, input) with lfn, guid, scope, status
0631 - `log_urls`: Harvester log URLs — `pilot_stdout`, `pilot_stderr`, `batch_log` (require CILogon auth)
0632 - `log_file`: Log tarball metadata if registered (lfn, guid, scope for future rucio retrieval)
0633 - `harvester`: Condor worker details (workerid, status, error info)
0634 - `task`: Parent JEDI task context (name, status, error dialog)
0635 - `monitor_url`: Link to PanDA monitoring page
0636
0637 Use cases:
0638 - Study a failed job: `panda_study_job(pandaid=130497)`
0639 - After `panda_diagnose_jobs` identifies failures, drill into specific jobs
0640
0641 ---
0642
0643 ### PanDA Mattermost Bot
0644
0645 The PanDA bot (`monitor_app/panda/bot.py`) is an MCP **client**. It answers production-monitoring questions in Mattermost by selecting and calling tools across multiple MCP servers.
0646
0647 **Architecture:**
0648 - Listens on a Mattermost channel via WebSocket (`mattermostdriver`)
0649 - Holds connections to the local swf-monitor MCP (HTTP — the `swf_*`, `pcs_*`, `panda_*` tools) plus seven stdio-launched external servers: **LXR** (EIC code browser cross-reference), **uproot** (ROOT file analysis), **GitHub**, **Zenodo**, **XRootD**, **JLab-Rucio**, **BNL-Rucio**
0650 - Registers in-process **epicdoc** tools (`epic_doc_search`, `epic_doc_contents`) backed by a ChromaDB vector store of ePIC docs — runs inside the bot process, not as a separate MCP server
0651 - **Bamboo** log analysis is used via the `panda_study_job` and `panda_harvester_workers` swf-monitor MCP tools, not as a separate MCP server
0652 - System prompt is externalized to a file and re-read per message, so prompt iteration doesn't require a bot restart
0653 - **3-tier tool awareness**: every tool is visible by name+one-liner in the system prompt so the LLM knows the full catalog; detailed schemas are fetched only for tools the LLM explicitly selects via `select_tools`; the bot preserves server and suggestion context across thread turns so follow-ups don't re-select from scratch
0654 - **Progressive tool loading via semantic similarity**: for each user question the bot embeds the question and ranks tools by server-prefixed cosine similarity, auto-truncating at a score cliff — the LLM sees a small, relevant set rather than all hundreds of tools
0655 - **DPID (Data Provenance ID) anti-fabrication**: for questions about specific jobs/tasks, the bot verifies the LLM cited a real DPID from tool output, strips the DPID from the user-facing reply, and warns if verification fails
0656 - Remembers recent Q&A exchanges (via `swf_record_ai_memory`) to improve responses over time. Memory is collective — the bot does not track or remember who asked what
0657 - `/panda` slash commands for direct queries without LLM involvement (status, errors, jobs/tasks by filter, site detail)
0658 - Server-side matplotlib plots rendered in Mattermost
0659
0660 **Running:** `manage.py panda_bot`
0661
0662 **Environment variables:**
0663 - `MATTERMOST_URL` (default: `chat.epic-eic.org`)
0664 - `MATTERMOST_TOKEN` (required)
0665 - `MATTERMOST_TEAM` (default: `main`)
0666 - `MATTERMOST_CHANNEL` (default: `pandabot`)
0667 - `MCP_URL` (default: `https://pandaserver02.sdcc.bnl.gov/swf-monitor/mcp/`)
0668 - `ANTHROPIC_API_KEY` (required, used by the Anthropic SDK)
0669
0670 **MCP transport:** The bot uses a minimal HTTP POST client (`MCPClient`) that sends JSON-RPC requests to the MCP endpoint — the same transport Claude Code uses. Each user question gets a fresh MCP session (initialize, discover tools, tool-use loop, close).
0671
0672 ---
0673
0674 ## Tool Summary
0675
0676 | Category | Tools | Count |
0677 |----------|-------|-------|
0678 | Tool Discovery | `swf_list_available_tools` | 1 |
0679 | System State | `swf_get_system_state` | 1 |
0680 | Agents | `swf_list_agents`, `swf_get_agent` | 2 |
0681 | Namespaces | `swf_list_namespaces`, `swf_get_namespace` | 2 |
0682 | Workflow Definitions | `swf_list_workflow_definitions` | 1 |
0683 | Workflow Executions | `swf_list_workflow_executions`, `swf_get_workflow_execution` | 2 |
0684 | Messages | `swf_list_messages`, `swf_send_message` | 2 |
0685 | Runs | `swf_list_runs`, `swf_get_run` | 2 |
0686 | STF Files | `swf_list_stf_files`, `swf_get_stf_file` | 2 |
0687 | TF Slices | `swf_list_tf_slices`, `swf_get_tf_slice` | 2 |
0688 | Logs | `swf_list_logs`, `swf_get_log_entry` | 2 |
0689 | Workflow Control | `swf_start_workflow`, `swf_stop_workflow`, `swf_end_execution` | 3 |
0690 | Agent Management | `swf_kill_agent` | 1 |
0691 | User Agent Manager | `swf_check_agent_manager`, `swf_get_testbed_status`, `swf_start_user_testbed`, `swf_stop_user_testbed` | 4 |
0692 | Workflow Monitoring | `swf_get_workflow_monitor`, `swf_list_workflow_monitors` | 2 |
0693 | AI Memory | `swf_record_ai_memory`, `swf_get_ai_memory` | 2 |
0694 | PCS Tags | `pcs_list_tags`, `pcs_get_tag`, `pcs_search_tags` | 3 |
0695 | PanDA Production | `panda_get_activity`, `panda_list_jobs`, `panda_diagnose_jobs`, `panda_list_tasks`, `panda_error_summary`, `panda_study_job`, `panda_list_queues`, `panda_get_queue`, `panda_resource_usage`, `panda_harvester_workers` | 10 |
0696 | **Total** | | **44** |
0697
0698 ---
0699
0700 ## Quick Reference - Example Prompts
0701
0702 System Readiness
0703 - "What's the state of the testbed?"
0704 - "Am I ready to run a workflow?"
0705 - "Is my agent manager running?"
0706 - "Are there any errors in the system?"
0707
0708 Starting the Testbed
0709 - "Start my testbed"
0710 - "Start my testbed with the fast_processing config"
0711 - "Check if my agents are running"
0712
0713 Running Workflows
0714 - "Start a workflow"
0715 - "Run a workflow with 5 STF files"
0716 - "Start a workflow with 3 physics periods"
0717 - "What's running right now?"
0718
0719 Monitoring
0720 - "What's the status of my workflow?"
0721 - "Show me the progress of execution stf_datataking-wenauseic-0045"
0722 - "How many STF files have been generated?"
0723 - "Are there any errors in my workflow?"
0724
0725 Stopping
0726 - "Stop my running workflow"
0727 - "Stop the testbed"
0728
0729 Troubleshooting
0730 - "Why did my workflow fail?"
0731 - "Show me the logs for the DAQ simulator"
0732 - "What errors happened in the last hour?"
0733 - "Kill the stuck daq_simulator agent"
0734
0735 Combined Operations
0736 - "Start my testbed and run a workflow with 10 STF files"
0737 - "Check if I'm ready to run, and if so, start a workflow"
0738
0739 ---
0740
0741 ## Example Prompts - Detailed
0742
0743 ### What's Running?
0744
0745 > "What's running in the testbed?"
0746
0747 LLM calls `swf_list_workflow_executions(currently_running=True)` and summarizes the running executions by namespace and workflow type.
0748
0749 > "What's the state of my running workflow?"
0750
0751 LLM calls `get_workflow_monitor(execution_id='...')` for aggregated status, or `swf_list_workflow_executions(currently_running=True, namespace="user_namespace")`.
0752
0753 ### System Health
0754
0755 > "What's the current state of the testbed?"
0756
0757 LLM calls `swf_get_system_state(username='wenauseic')` and summarizes user context, agent health, running workflows, and system state.
0758
0759 > "Am I ready to run a workflow?"
0760
0761 LLM calls `swf_get_system_state(username='...')` and checks `ready_to_run` field. If False, explains what's missing (agent manager, workflow runner).
0762
0763 ### Starting and Stopping Workflows
0764
0765 > "Start a workflow with 5 STF files"
0766
0767 LLM calls `swf_start_workflow(stf_count=5)` - other parameters default from testbed.toml.
0768
0769 > "Stop my running workflow"
0770
0771 LLM calls `swf_list_workflow_executions(currently_running=True)` to find the execution_id, then `swf_stop_workflow(execution_id='...')`.
0772
0773 ### Error Discovery
0774
0775 > "Are there any errors in the system?"
0776
0777 LLM calls `swf_list_logs(level='ERROR')` to find error and critical log entries, then summarizes the issues found.
0778
0779 > "Why did my workflow fail?"
0780
0781 LLM calls:
0782 1. `swf_list_workflow_executions(status='failed', namespace="user_namespace")` - find failed executions
0783 2. `get_workflow_monitor(execution_id='...')` - get aggregated errors
0784 3. `swf_list_logs(execution_id='...', level='ERROR')` - detailed error logs
0785
0786 ### Activity Summary
0787
0788 > "Summarize testbed activity for the past week."
0789
0790 LLM makes multiple calls:
0791 1. `swf_list_workflow_executions(start_time="2026-01-06T00:00:00", end_time="2026-01-13T00:00:00")` - all executions
0792 2. `swf_list_agents()` - registered agents
0793 3. `swf_list_namespaces()` - active namespaces
0794 4. Synthesizes: "In the past week, 47 workflow executions ran across 3 namespaces..."
0795
0796 ### Investigating a Run
0797
0798 > "Show me details about run 100042 and its STF files."
0799
0800 LLM calls:
0801 1. `swf_get_run(run_number=100042)` - run details
0802 2. `list_stf_files(run_number=100042)` - associated STF files
0803
0804 ### Agent Troubleshooting
0805
0806 > "The fast_processing agent seems unresponsive. What's happening?"
0807
0808 LLM calls:
0809 1. `swf_get_agent(name="fast_processing-agent-wenauseic-123")` - agent status
0810 2. `swf_list_logs(instance_name="fast_processing-agent-wenauseic-123", level='WARNING')` - recent issues
0811 3. If needed: `kill_agent(name="...")` to terminate unresponsive agent
0812
0813 ### Managing User Testbed
0814
0815 > "Start my testbed"
0816
0817 LLM calls:
0818 1. `check_agent_manager(username='wenauseic')` - verify agent manager is running
0819 2. If alive: `start_user_testbed(username='wenauseic')`
0820 3. If not: Instructs user to run `testbed agent-manager`
0821
0822 ### Namespace Activity
0823
0824 > "What's happening in namespace torre1 today?"
0825
0826 LLM calls:
0827 1. `swf_get_namespace(namespace="torre1", start_time="2026-01-13T00:00:00")` - activity counts
0828 2. `swf_list_workflow_executions(namespace="torre1", start_time="2026-01-13T00:00:00")` - executions
0829 3. `swf_list_agents(namespace="torre1")` - agents
0830
0831 ### Fast Processing Status
0832
0833 > "What's the status of TF slice processing for run 100042?"
0834
0835 LLM calls:
0836 1. `list_tf_slices(run_number=100042)` - all slices
0837 2. Summarizes by status: "Run 100042 has 150 slices: 120 completed, 25 processing, 5 queued."
0838
0839 ---
0840
0841 ## Technical Reference
0842
0843 ### File Locations
0844
0845 The MCP service spans multiple files in the `swf-monitor` repository:
0846
0847 ```
0848 swf-monitor/
0849 ├── docs/
0850 │ └── MCP.md # This documentation
0851 ├── src/
0852 │ ├── monitor_app/
0853 │ │ ├── mcp/ # MCP tool definitions (package)
0854 │ │ │ ├── __init__.py # Tool registration
0855 │ │ │ ├── system.py # System, agent, namespace tools
0856 │ │ │ ├── workflows.py # Workflow, message, run, STF tools
0857 │ │ │ ├── ai_memory.py # AI memory tools
0858 │ │ │ └── common.py # Shared utilities
0859 │ │ ├── panda/ # PanDA Mattermost bot (MCP client)
0860 │ │ │ └── bot.py # Bot logic, MCPClient, Claude integration
0861 │ │ ├── management/commands/
0862 │ │ │ └── panda_bot.py # `manage.py panda_bot` management command
0863 │ │ ├── auth0.py # JWT validation with Auth0 JWKS
0864 │ │ ├── middleware.py # MCPAuthMiddleware for OAuth
0865 │ │ └── views.py # OAuth protected resource metadata
0866 │ └── swf_monitor_project/
0867 │ ├── settings.py # Server config, Auth0 settings
0868 │ └── urls.py # Route registration (/mcp/, /.well-known/)
0869 ```
0870
0871 | File | Purpose |
0872 |------|---------|
0873 | `src/monitor_app/mcp/` | **Tool definitions** - MCP tool package (system.py, workflows.py, ai_memory.py, common.py) |
0874 | `src/monitor_app/panda/bot.py` | **PanDA Mattermost bot** - MCP client using HTTP POST, Claude Haiku for responses |
0875 | `src/monitor_app/auth0.py` | **Auth0 integration** - JWT validation, JWKS caching |
0876 | `src/monitor_app/middleware.py` | **Authentication middleware** - MCPAuthMiddleware for OAuth 2.1 |
0877 | `src/swf_monitor_project/settings.py` | **Server config** - MCP config, Auth0 settings |
0878 | `src/swf_monitor_project/urls.py` | **Route registration** - mounts MCP at `/mcp/`, OAuth metadata at `/.well-known/` |
0879 | `docs/MCP.md` | **Documentation** - this file |
0880
0881 ### Architecture
0882
0883 MCP is integrated directly into Django rather than as a separate service, but the `/mcp/` endpoint is served by a **separate ASGI worker** (uvicorn) fronted by Apache `ProxyPass`. The rest of swf-monitor stays on mod_wsgi. Rationale: `django-mcp-server` uses Starlette's `StreamableHTTPSessionManager`, which holds one thread per streaming session; under WSGI that saturates the thread pool when a handful of clients hit MCP. Isolating MCP on an ASGI worker keeps the failure mode out of the main app. See `swf-monitor-mcp-asgi.service` and the ProxyPass block in `apache-swf-monitor.conf`.
0884
0885 - **Django** exposes the MCP endpoint alongside the REST API (same codebase, same URL tree)
0886 - **ASGI (uvicorn)** serves `/mcp/` on `127.0.0.1:8001`; Apache ProxyPasses to it with streaming-safe settings (`timeout=3600`, `disablereuse=On`, `no-gzip`, `CacheDisable`)
0887 - **django-mcp-server** provides MCP spec compliance and tool registration
0888 - **Auth0 OAuth 2.1** authentication for Claude.ai remote connections (optional, disabled if AUTH0_DOMAIN not set)
0889
0890 ### Tool Registration
0891
0892 Tools are defined in `monitor_app/mcp/` using the `@mcp.tool()` decorator on async functions. Each function becomes an MCP tool that LLMs can discover and call.
0893
0894 The package is auto-discovered by django-mcp-server via `monitor_app/mcp/__init__.py`.
0895
0896 **Tool docstrings are critical** - they are the only documentation the LLM sees when deciding which tool to use and how to call it.
0897
0898 ### Transport
0899
0900 Streamable HTTP (the MCP spec's HTTP transport) — `django-mcp-server` uses Starlette's `StreamableHTTPSessionManager`, so the server can stream responses and long-lived SSE-style session events when a client asks for them. The Django view (`MCPServerStreamableHttpView`) dispatches GET/POST/DELETE to the session manager; sessions are keyed via the `mcp-session-id` header and persisted in the Django session store.
0901
0902 Because streaming sessions tie up a thread for the session's lifetime, the endpoint runs under an ASGI worker (uvicorn) rather than mod_wsgi — see the Architecture section above.
0903
0904 ### Settings
0905
0906 ```python
0907 # Django MCP Server configuration
0908 DJANGO_MCP_GLOBAL_SERVER_CONFIG = {
0909 "name": "swf-monitor",
0910 "instructions": "ePIC Streaming Workflow Testbed monitoring and control server",
0911 }
0912
0913 # Auth0 OAuth 2.1 configuration (optional - leave AUTH0_DOMAIN empty to disable)
0914 AUTH0_DOMAIN = config("AUTH0_DOMAIN", default="")
0915 AUTH0_CLIENT_ID = config("AUTH0_CLIENT_ID", default="")
0916 AUTH0_CLIENT_SECRET = config("AUTH0_CLIENT_SECRET", default="")
0917 AUTH0_API_IDENTIFIER = config("AUTH0_API_IDENTIFIER", default="")
0918 AUTH0_ALGORITHMS = ["RS256"]
0919 ```
0920
0921 ### Adding New Tools
0922
0923 ```python
0924 # In monitor_app/mcp/system.py, workflows.py, or new file in the mcp/ package
0925
0926 from mcp_server import mcp_server as mcp
0927
0928 @mcp.tool()
0929 async def my_new_tool(param: str, start_time: str = None, end_time: str = None) -> dict:
0930 """
0931 Tool description shown to the LLM.
0932
0933 Args:
0934 param: Parameter description
0935 start_time: Optional ISO datetime for range start
0936 end_time: Optional ISO datetime for range end
0937
0938 Returns:
0939 Result description
0940 """
0941 from asgiref.sync import sync_to_async
0942 from django.utils.dateparse import parse_datetime
0943
0944 @sync_to_async
0945 def do_work():
0946 queryset = MyModel.objects.all()
0947
0948 # Apply date range filter
0949 if start_time:
0950 queryset = queryset.filter(created_at__gte=parse_datetime(start_time))
0951 if end_time:
0952 queryset = queryset.filter(created_at__lte=parse_datetime(end_time))
0953
0954 return [{"field": obj.field} for obj in queryset]
0955
0956 return await do_work()
0957 ```
0958
0959 **IMPORTANT:** After adding a new `@mcp.tool()`, you MUST also:
0960 1. Add the tool to the hardcoded list in `swf_list_available_tools()` in `mcp/common.py`
0961 2. Update the server instructions in `settings.py` (`DJANGO_MCP_GLOBAL_SERVER_CONFIG`)
0962 3. Update this documentation (`docs/MCP.md`)
0963
0964 The `list_available_tools()` hardcoded list is what LLMs see when discovering available tools - if your tool isn't in that list, LLMs won't know it exists.
0965
0966 ### References
0967
0968 - [MCP Specification](https://modelcontextprotocol.io/specification/2025-11-25/server/tools)
0969 - [django-mcp-server GitHub](https://github.com/omarbenhamid/django-mcp-server)
0970 - [OAuth2 Provider](https://django-oauth-toolkit.readthedocs.io/)