Warning, /swf-monitor/docs/API_REFERENCE.md is written in an unsupported language. File is not indexed.
0001 # API Reference Guide
0002
0003 Complete reference for the swf-monitor REST API and WebSocket services.
0004
0005 ## API Documentation
0006
0007 The API is documented using OpenAPI (Swagger). View interactive documentation:
0008
0009 * **Swagger UI**: `https://pandaserver02.sdcc.bnl.gov/swf-monitor/api/schema/swagger-ui/`
0010 * **ReDoc**: `https://pandaserver02.sdcc.bnl.gov/swf-monitor/api/schema/redoc/`
0011
0012 ## Database Schema
0013
0014 Auto-generated schema diagram: **[testbed-schema.dbml](../testbed-schema.dbml)**
0015
0016 ### Core Models
0017
0018 The system uses Django models to track agents, runs, data files, and messaging:
0019
0020 - **SystemAgent**: Agent instances with status and heartbeat tracking
0021 - **AppLog**: Centralized logging from all agents and services
0022 - **Run**: Experimental runs containing multiple STF files
0023 - **StfFile**: Super Time Frame files with processing status
0024 - **FastMonFile**: Fast monitoring time frame sample files metadata
0025 - **MessageQueueDispatch**: Message queue operations and delivery tracking
0026 - **Subscriber**: Message queue subscribers and their configurations
0027 - **PersistentState**: System state persistence for workflow tracking
0028 - **PandaQueue**: PanDA queue configuration for job submission
0029 - **RucioEndpoint**: Rucio data management endpoint definitions
0030
0031 ### PCS Models (Physics Configuration System)
0032
0033 Production task configuration for Monte Carlo simulation campaigns:
0034
0035 - **PhysicsCategory**: Physics areas (DVCS, DIS, SIDIS) with digit-based tag numbering
0036 - **PhysicsTag**: Physics process parameter sets (p3001, p3002...) with draft/locked lifecycle
0037 - **EvgenTag**: Event generation configurations (e1, e2...)
0038 - **SimuTag**: Simulation configurations (s1, s2...)
0039 - **RecoTag**: Reconstruction configurations (r1, r2...)
0040 - **Dataset**: Production datasets composed from locked tags with automatic block management
0041 - **ProdConfig**: Production configuration templates (background mixing, output control, software stack, PanDA/Rucio overrides)
0042
0043 See **[PCS documentation](PCS.md)** for full details.
0044
0045 ## ActiveMQ Integration
0046
0047 ### Automatic Connection Management
0048
0049 The monitor includes built-in ActiveMQ integration that starts automatically when Django launches. This integration:
0050
0051 - **Automatic Startup**: Connects to ActiveMQ when `python manage.py runserver` starts
0052 - **Smart Initialization**: Only connects during normal operation, not during management commands like `migrate` or `test`
0053 - **Configuration-Driven**: Requires `ACTIVEMQ_HOST` environment variable to be set
0054 - **SSL Support**: Handles SSL certificate configuration for secure connections
0055 - **Graceful Cleanup**: Automatically disconnects when Django shuts down
0056
0057 ### Implementation Details
0058
0059 - **Connection Manager**: `ActiveMQConnectionManager` (singleton) in `monitor_app/activemq_connection.py`
0060 - **App Integration**: Initialized via `MonitorAppConfig.ready()` in `monitor_app/apps.py`
0061 - **Message Processing**: Handles agent heartbeats and workflow messages
0062 - **Thread Safety**: Uses threading locks for safe singleton operation
0063
0064 ### Configuration
0065
0066 Set these environment variables for ActiveMQ integration:
0067
0068 ```bash
0069 export ACTIVEMQ_HOST='your-activemq-host'
0070 export ACTIVEMQ_PORT=61612
0071 export ACTIVEMQ_USER='username'
0072 export ACTIVEMQ_PASSWORD='password'
0073 export ACTIVEMQ_USE_SSL=True
0074 export ACTIVEMQ_SSL_CA_CERTS='/path/to/ca-cert.pem'
0075 ```
0076
0077 No separate management command is needed - the integration is fully automatic.
0078
0079 ## Authentication
0080
0081 ### Token-Based Authentication
0082
0083 For programmatic access, the API uses token-based authentication for write operations.
0084
0085 #### Generate a Token
0086
0087 ```bash
0088 # Get token for existing user
0089 python manage.py get_token <username>
0090
0091 # Create new user and token
0092 python manage.py get_token <new_username> --create-user
0093 ```
0094
0095 #### Use the Token
0096
0097 Include the token in the `Authorization` header.
0098
0099 ### Production HTTPS Access
0100
0101 When connecting to the production monitor at `https://pandasserver02.sdcc.bnl.gov/swf-monitor/`, clients need the SSL certificate chain for verification.
0102
0103 #### SSL Certificate Setup
0104
0105 Set the certificate bundle path before making requests:
0106
0107 ```bash
0108 export REQUESTS_CA_BUNDLE=/opt/swf-monitor/current/full-chain.pem
0109 ```
0110
0111 **Note**: The certificate bundle is deployed automatically by the production deployment script and contains the InCommon RSA IGTF Server CA chain required for pandasserver02.sdcc.bnl.gov certificate validation.
0112
0113 ## REST API Endpoints
0114
0115 ### System Agents
0116 - `GET /api/systemagents/` - List all agents
0117 - `POST /api/systemagents/` - Create new agent
0118 - `GET /api/systemagents/{id}/` - Get specific agent
0119 - `PATCH /api/systemagents/{id}/` - Update agent
0120 - `DELETE /api/systemagents/{id}/` - Delete agent
0121
0122 ### Application Logs
0123 - `GET /api/logs/` - List logs with filtering
0124 - `POST /api/logs/` - Create log entry
0125 - `GET /api/logs/summary/` - Get log summary by app/instance
0126
0127 ### Runs
0128 - `GET /api/runs/` - List experimental runs
0129 - `POST /api/runs/` - Create new run
0130 - `GET /api/runs/{id}/` - Get specific run
0131 - `PATCH /api/runs/{id}/` - Update run
0132
0133 ### STF Files
0134 - `GET /api/stf-files/` - List STF files
0135 - `POST /api/stf-files/` - Register new STF file
0136 - `GET /api/stf-files/{id}/` - Get specific file
0137 - `PATCH /api/stf-files/{id}/` - Update file status
0138
0139 ### Message Queue Dispatches
0140 - `GET /api/message-dispatches/` - List dispatches
0141 - `POST /api/message-dispatches/` - Create dispatch record
0142
0143 ### Subscribers
0144 - `GET /api/subscribers/` - List subscribers
0145 - `POST /api/subscribers/` - Create subscriber
0146 - `PATCH /api/subscribers/{id}/` - Update subscriber
0147
0148 ### Fast Monitoring Files
0149 - `GET /api/fastmon-files/` - List fast monitoring files
0150 - `POST /api/fastmon-files/` - Register new fast monitoring file
0151 - `GET /api/fastmon-files/{id}/` - Get specific file
0152 - `PATCH /api/fastmon-files/{id}/` - Update file metadata
0153
0154 ### Workflows
0155 - `GET /api/workflows/` - List STF workflows
0156 - `POST /api/workflows/` - Create new workflow
0157 - `GET /api/workflows/{id}/` - Get specific workflow
0158 - `PATCH /api/workflows/{id}/` - Update workflow status
0159
0160 ### Workflow Stages
0161 - `GET /api/workflow-stages/` - List agent workflow stages
0162 - `POST /api/workflow-stages/` - Create workflow stage
0163 - `GET /api/workflow-stages/{id}/` - Get specific stage
0164 - `PATCH /api/workflow-stages/{id}/` - Update stage status
0165
0166 ### Workflow Messages
0167 - `GET /api/workflow-messages/` - List workflow messages
0168 - `POST /api/workflow-messages/` - Create workflow message
0169 - `GET /api/workflow-messages/{id}/` - Get specific message
0170
0171 ### System State
0172 - `GET /api/state/next-run-number/` - Get next available run number
0173
0174 ### PCS - Physics Configuration System
0175
0176 All PCS endpoints are under `/pcs/api/`. See **[PCS documentation](PCS.md)** for full API reference with examples.
0177
0178 - `GET/POST /pcs/api/physics-categories/` - List/create physics categories
0179 - `GET/POST /pcs/api/physics-tags/` - List/create physics tags (number auto-assigned)
0180 - `GET/PATCH /pcs/api/physics-tags/{N}/` - Get/update physics tag (draft only)
0181 - `POST /pcs/api/physics-tags/{N}/lock/` - Lock physics tag (one-way)
0182 - `GET/POST /pcs/api/evgen-tags/` - List/create evgen tags
0183 - `GET/POST /pcs/api/simu-tags/` - List/create simu tags
0184 - `GET/POST /pcs/api/reco-tags/` - List/create reco tags
0185 - `GET/POST /pcs/api/datasets/` - List/create datasets (all tags must be locked)
0186 - `POST /pcs/api/datasets/{id}/add-block/` - Add next block to dataset
0187 - `GET/POST /pcs/api/prod-configs/` - List/create production configs
0188 - `GET/PATCH/DELETE /pcs/api/prod-configs/{id}/` - Get/update/delete production config
0189
0190 ## Server-Sent Events (SSE) Streaming
0191
0192 ### Overview
0193 The monitor provides real-time message streaming via Server-Sent Events by forwarding ActiveMQ messages to receivers via HTTPS REST. This allows receivers to be geographically distributed anywhere with internet access, without requiring distributed ActiveMQ infrastructure - only HTTPS connectivity is needed.
0194
0195 ### Endpoints
0196
0197 #### Stream Messages
0198 - **URL**: `GET /api/messages/stream/`
0199 - **Authentication**: Token required
0200 - **Protocol**: HTTPS (port 443)
0201 - **Content-Type**: `text/event-stream`
0202
0203 #### Query Parameters
0204 - `msg_types`: Comma-separated message types to filter (e.g., `stf_gen,data_ready`)
0205 - `agents`: Comma-separated agent names to filter (e.g., `daq-simulator,data-agent`)
0206 - `run_ids`: Comma-separated run IDs to filter (e.g., `run-001,run-002`)
0207
0208 #### Example Usage
0209 ```bash
0210 curl -H "Authorization: Token YOUR_TOKEN" \
0211 "https://pandaserver02.sdcc.bnl.gov/swf-monitor/api/messages/stream/?msg_types=stf_gen,data_ready&agents=daq-simulator"
0212 ```
0213
0214 #### Stream Status
0215 - **URL**: `GET /api/messages/stream/status/`
0216 - **Authentication**: Token required
0217 - **Returns**: Current broadcaster status and connected client count
0218
0219 ```json
0220 {
0221 "connected_clients": 2,
0222 "client_ids": ["uuid1", "uuid2"],
0223 "client_filters": {...}
0224 }
0225 ```
0226
0227 ### Message Format
0228 SSE events use the following format:
0229 ```
0230 event: message_type
0231 data: {"msg_type": "stf_gen", "processed_by": "daq-simulator", "run_id": "run-001", ...}
0232
0233 event: heartbeat
0234 data: {"timestamp": 1640995200.0}
0235
0236 event: connected
0237 data: {"client_id": "uuid", "status": "connected"}
0238 ```
0239
0240 ### Architecture
0241 - **Message Routing**: ActiveMQ messages are relayed to SSE clients via Redis channel layer
0242 - **Client Management**: Each client gets a dedicated message queue with configurable filtering
0243 - **Scalability**: Redis-backed channel layer supports multiple Django processes
0244 - **Reliability**: Automatic client cleanup and connection management
0245
0246 ## Model Control Protocol (MCP)
0247
0248
0249 ### REST Endpoints
0250
0251 - `POST /api/mcp/heartbeat/` - Process agent heartbeat
0252 - `POST /api/mcp/discover-capabilities/` - Get available commands
0253 - `POST /api/mcp/agent-liveness/` - Get agent liveness status
0254
0255 ### Message Format
0256
0257 ```json
0258 {
0259 "mcp_version": "1.0",
0260 "message_id": "unique-uuid",
0261 "command": "command_name",
0262 "payload": {
0263 "key": "value"
0264 }
0265 }
0266 ```
0267
0268 ### Available Commands
0269
0270 #### discover_capabilities
0271 Returns available MCP commands and descriptions.
0272 - **Request payload**: `{}`
0273 - **Response**: Dictionary of command names and descriptions
0274
0275 #### get_agent_liveness
0276 Reports agent liveness based on recent heartbeats.
0277 - **Request payload**: `{}`
0278 - **Response**: Dictionary mapping agent names to 'alive'/'dead' status
0279
0280 #### heartbeat (Notification)
0281 Agent sends to signal it's active (no response expected).
0282 - **Payload**: `{"name": "agent-name", "timestamp": "iso-8601-timestamp", "status": "OK"}`
0283
0284 ## Agent Integration
0285
0286 ### REST Logging
0287
0288 Agents can send logs using the `swf-common-lib` package:
0289
0290 ```python
0291 import logging
0292 from swf_common_lib.rest_logging import setup_rest_logging
0293
0294 # Setup - the infrastructure handles URLs automatically
0295 logger = setup_rest_logging(
0296 app_name='my_agent',
0297 instance_name='agent_001'
0298 )
0299
0300 # Use standard Python logging
0301 logger.info("Agent started")
0302 logger.error("Processing failed")
0303 ```
0304
0305 ### Logging Endpoint
0306
0307 - **URL**: `/api/logs/`
0308 - **Method**: POST
0309 - **Authentication**: None required
0310 - **Content-Type**: `application/json`
0311
0312 Example log entry:
0313 ```json
0314 {
0315 "app_name": "data_agent",
0316 "instance_name": "agent_001",
0317 "timestamp": "2025-01-15T10:30:00.000Z",
0318 "level": 20,
0319 "levelname": "INFO",
0320 "message": "Processing file batch 1/10",
0321 "module": "data_processor",
0322 "funcname": "process_batch",
0323 "lineno": 45,
0324 "process": 1234,
0325 "thread": 5678
0326 }
0327 ```
0328
0329 ## Management Commands
0330
0331 - `createsuperuser` - Create admin user
0332 - `get_token <username> [--create-user]` - Generate API token
0333
0334 ## Error Handling
0335
0336 ### HTTP Status Codes
0337 - `200` - Success
0338 - `201` - Created
0339 - `400` - Bad Request
0340 - `401` - Unauthorized
0341 - `403` - Forbidden
0342 - `404` - Not Found
0343 - `500` - Internal Server Error
0344
0345 ### MCP Error Codes
0346 - `4000` - Invalid request format
0347 - `4001` - Missing mcp_version field
0348 - `4002` - Unsupported MCP version
0349 - `4004` - Unknown command
0350 - `5000` - Internal server error
0351
0352 For detailed API schemas and examples, see the interactive documentation at `https://pandaserver02.sdcc.bnl.gov/swf-monitor/api/schema/swagger-ui/`.