Warning, /swf-monitor/docs/PRODUCTION_DEPLOYMENT.md is written in an unsupported language. File is not indexed.
0001 # Production Deployment Guide
0002
0003 **Complete guide for deploying SWF Monitor to production Apache environment on pandaserver02.**
0004
0005 ## Overview
0006
0007 This guide covers the complete production deployment process for the SWF Monitor Django application, including initial infrastructure setup and ongoing deployment updates. The production environment uses Apache with Python 3.11 mod_wsgi to serve the Django application.
0008
0009 Note on usage modes:
0010 - Standalone development mode runs local services under supervisord for convenience.
0011 - Production platform mode relies on central, system-managed services (PostgreSQL, ActiveMQ, Redis, Apache).
0012 Use the system status reporter to see which mode your host supports and which services are available:
0013
0014 ```bash
0015 python /eic/u/wenauseic/github/swf-testbed/report_system_status.py
0016 ```
0017
0018 ## Architecture
0019
0020 **Production Structure:**
0021 ```
0022 /opt/swf-monitor/
0023 ├── releases/ # Versioned deployments
0024 │ ├── branch-main/
0025 │ └── branch-infra-baseline-vX/
0026 ├── current/ # Symlink to active release
0027 ├── shared/
0028 │ ├── logs/ # Application logs
0029 │ ├── static/ # Django static files
0030 │ └── uploads/ # File uploads
0031 ├── config/
0032 │ ├── apache/ # Apache configuration
0033 │ └── env/
0034 │ └── production.env # Production environment variables
0035 └── bin/
0036 └── deploy-swf-monitor.sh # Deployment automation
0037 ```
0038
0039 **Key Components:**
0040 - **Apache HTTP Server**: Serves static files, proxies Django via mod_wsgi for most paths, and ProxyPasses `/swf-monitor/mcp/` to the ASGI worker
0041 - **Python 3.11 mod_wsgi**: WSGI interface for Django application (all paths except `/mcp/`)
0042 - **ASGI worker (uvicorn)**: `swf-monitor-mcp-asgi.service` on `127.0.0.1:8001` serves `/swf-monitor/mcp/`. Streaming MCP (StreamableHTTPSessionManager) holds a thread per session; under WSGI that saturates the pool. The ASGI worker isolates that failure mode from the rest of the app.
0043 - **PostgreSQL**: Production database (system-managed)
0044 - **ActiveMQ**: Message broker (system-managed via artemis.service)
0045 - **Redis (Channels layer)**: Required inter-process relay used by the SSE forwarder. Redis/Channels-backed SSE is an integral part of the system whenever remote ActiveMQ client recipients are supported.
0046 - **Mattermost bots**: `swf-panda-bot.service` and `swf-testbed-bot.service` — Claude-backed chatbots for `#pandabot` and `#testbed-bot` channels
0047 - **Release Management**: Automated deployment with Apache-conf sync and ASGI-worker recycle
0048
0049 ## Prerequisites
0050
0051 Before starting production deployment, ensure:
0052
0053 1. **System Services Running:**
0054 - PostgreSQL (postgresql-16.service or equivalent)
0055 - ActiveMQ/Artemis (artemis.service)
0056 - Apache HTTP Server (httpd.service)
0057 - Redis (redis.service) — Required for SSE relay via Django Channels. This is integral to production operation to support remote recipients of ActiveMQ events over HTTPS (SSE).
0058
0059 2. **Development Environment Ready:**
0060 - SWF testbed development environment set up (see [Development Environment Setup](#development-environment-setup) below)
0061 - Virtual environment with all dependencies installed
0062 - All repositories updated to desired branch/tag
0063
0064 3. **System Access:**
0065 - Root access (sudo) for Apache configuration and deployment
0066 - Database credentials for production PostgreSQL instance
0067
0068 ### Development Environment Setup
0069
0070 To set up an equivalent development environment in any user account:
0071
0072 1. **Clone all repositories as siblings:**
0073 ```bash
0074 cd /path/to/your/workspace
0075 git clone https://github.com/BNLNPPS/swf-testbed.git
0076 git clone https://github.com/BNLNPPS/swf-monitor.git
0077 git clone https://github.com/BNLNPPS/swf-common-lib.git
0078 # Clone other swf-* agent repositories as needed
0079 ```
0080
0081 2. **Set up the testbed environment:**
0082 ```bash
0083 cd swf-testbed
0084 source install.sh # Creates .venv and installs all dependencies
0085 ```
0086
0087 3. **Configure environment variables:**
0088 ```bash
0089 # Copy and customize environment template
0090 cp ../swf-monitor/.env.example ~/.env
0091 # Edit ~/.env with your specific configuration
0092 ```
0093
0094 4. **Set up database (if using local PostgreSQL):**
0095 ```bash
0096 cd swf-monitor/src
0097 source /path/to/swf-testbed/.venv/bin/activate
0098 python manage.py migrate
0099 python manage.py createsuperuser
0100 ```
0101
0102 The production deployment will copy the virtual environment from your development setup, so ensure all required packages are installed in your development `.venv`.
0103
0104 ## Initial Production Setup
0105
0106 **⚠️ This setup is performed ONCE when initially installing the production environment.**
0107
0108 ### Step 1: Run Apache Deployment Setup
0109
0110 ```bash
0111 # From the swf-monitor repository root
0112 sudo ./setup-apache-deployment.sh
0113 ```
0114
0115 This automated setup script:
0116
0117 1. **Creates deployment structure** at `/opt/swf-monitor/`
0118 2. **Installs Apache development headers** (httpd-devel)
0119 3. **Compiles Python 3.11 mod_wsgi** in the project virtual environment
0120 4. **Disables system mod_wsgi** to avoid Python version conflicts
0121 5. **Generates LoadModule config** into `/etc/httpd/conf.modules.d/20-swf-monitor-wsgi.conf` (LoadModule + WSGIPythonHome only — loads before `conf.d/` so the module is available when WSGIDaemonProcess is parsed)
0122 6. **Installs the Apache vhost config** by copying the repo canonical `apache-swf-monitor.conf` to `/etc/httpd/conf.d/swf-monitor.conf`. The repo file is the source of truth; `deploy-swf-monitor.sh` keeps live in sync with it on every deploy.
0123 7. **Copies deployment automation** script to `/opt/swf-monitor/bin/`
0124 8. **Creates production environment** template
0125 9. **Tests Apache configuration** (`httpd -t`) and restarts Apache
0126
0127 Note: you must also install the `swf-monitor-mcp-asgi.service` systemd unit from the repo (not automated by this script; one-time bootstrap per host):
0128
0129 ```bash
0130 sudo install -o root -g root -m 644 swf-monitor-mcp-asgi.service /etc/systemd/system/
0131 sudo systemctl daemon-reload
0132 sudo systemctl enable --now swf-monitor-mcp-asgi.service
0133 ```
0134
0135 ### Step 2: Configure Production Environment
0136
0137 Review and update the production environment configuration:
0138
0139 ```bash
0140 sudo nano /opt/swf-monitor/config/env/production.env
0141 ```
0142
0143 **Key Configuration Categories:**
0144
0145 - **Security Settings**: `DEBUG=False`, `SECRET_KEY`, `SWF_ALLOWED_HOSTS`
0146 - **Host Configuration**: URLs and hostnames for the production server
0147 - **Database Configuration**: PostgreSQL connection details
0148 - **ActiveMQ Configuration**: Message broker settings and SSL certificates
0149 - **Redis/Channels Configuration**: `REDIS_URL` for Channels channel layer powering SSE relay. Required for remote ActiveMQ recipients; without it, only single-process dev streaming works and is not suitable for production.
0150 - **API Authentication**: Tokens for agent authentication
0151 - **Proxy Settings**: Network proxy configuration
0152
0153 **Note**: The production.env file contains sensitive configuration values that must be customized for your environment.
0154
0155 **⚠️ IMPORTANT: .env is NOT deployed from git.** The `.env` file is in `.gitignore` for security reasons. The deploy script symlinks the release `.env` to `/opt/swf-monitor/config/env/production.env`. When you need to change environment settings (e.g., `ACTIVEMQ_HEARTBEAT_TOPIC`), you must edit the production.env file directly:
0156
0157 ```bash
0158 sudo nano /opt/swf-monitor/config/env/production.env
0159 # Then restart Apache to pick up changes:
0160 sudo systemctl restart httpd
0161 ```
0162
0163 ### Step 3: Deploy First Release
0164
0165 Deploy your first production release:
0166
0167 ```bash
0168 # Deploy main branch
0169 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh branch main
0170
0171 # OR deploy specific infrastructure branch
0172 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh branch infra/baseline-v18
0173 ```
0174
0175 ### Step 4: Verify Deployment
0176
0177 Test the production deployment:
0178
0179 ```bash
0180 # Test HTTP access
0181 curl https://pandaserver02.sdcc.bnl.gov/swf-monitor/
0182
0183 # Check Apache status
0184 systemctl status httpd
0185
0186 # Check deployment status
0187 ls -la /opt/swf-monitor/current
0188 ```
0189
0190 ## Ongoing Deployment Updates
0191
0192 **For regular updates to the production environment.**
0193
0194 ### Standard Update Process
0195
0196 When your repositories are ready for production update:
0197
0198 ```bash
0199 # Deploy main branch (most common)
0200 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh branch main
0201
0202 # Deploy specific branch
0203 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh branch infra/baseline-v19
0204
0205 # Deploy specific tag
0206 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh tag v1.2.3
0207 ```
0208
0209 ### What Happens During Deployment
0210
0211 The deployment script automatically:
0212
0213 1. **Validates** the branch/tag exists in GitHub repository
0214 2. **Creates** new release directory: `/opt/swf-monitor/releases/branch-main/`
0215 3. **Clones** the specified Git reference to the release directory
0216 4. **Copies** development virtual environment from the configured development path
0217 5. **Links** shared resources (logs, production.env, SSL certificates) and ensures the shared HuggingFace cache exists with open perms
0218 6. **Installs** WSGI LoadModule config from release's `config/apache/20-swf-monitor-wsgi.conf` into `/etc/httpd/conf.modules.d/`
0219 7. **Collects** Django static files with `python manage.py collectstatic`
0220 8. **Syncs** static files to shared Apache location
0221 9. **Runs** database migrations with `python manage.py migrate`
0222 10. **Updates** current symlink to point to new release, sets proper ownership
0223 11. **Syncs Apache vhost conf** — compares release's `apache-swf-monitor.conf` with live `/etc/httpd/conf.d/swf-monitor.conf`; if different, timestamped backup + install + `httpd -t` validates, rollback on failure
0224 12. **Reloads** Apache (`systemctl reload httpd`) — required every deploy to recycle mod_wsgi daemon processes so they pick up new Python code; any conf change from step 11 rides along on the same reload
0225 13. **Restarts** the ASGI worker (`systemctl restart swf-monitor-mcp-asgi.service`) so uvicorn picks up new code (uvicorn loads code once at startup and does not re-read on file change)
0226 14. **Conditionally restarts bots** (`swf-panda-bot`, `swf-testbed-bot`) — only if bot-specific code changed relative to the previous release
0227 15. **Health-checks** the deployment by hitting `/swf-monitor/api/`
0228 16. **Cleans up** old releases (keeps last 5)
0229
0230
0231 ### Deployment Output
0232
0233 Successful deployment shows:
0234
0235 ```
0236 [2025-01-13 14:30:15] Deployment completed successfully!
0237 [2025-01-13 14:30:15] Active release: branch-main
0238 [2025-01-13 14:30:15] Git commit: a1b2c3d
0239
0240 Current deployment status:
0241 Release: branch-main
0242 Path: /opt/swf-monitor/releases/branch-main
0243 Current: /opt/swf-monitor/releases/branch-main
0244 ```
0245
0246 ## Apache Configuration
0247
0248 **Source of truth:** `apache-swf-monitor.conf` in the repo root. The deploy script copies it to `/etc/httpd/conf.d/swf-monitor.conf` on every release whenever it differs from live (with `httpd -t` validation + rollback on failure). Editing the live file directly is safe for emergency triage, but any deploy will re-install the repo canonical — so durable changes belong in the repo file.
0249
0250 **Two-backend layout:**
0251 - mod_wsgi (`WSGIDaemonProcess swf-monitor`) serves `/swf-monitor/*` **except** `/mcp/`
0252 - mod_proxy → ASGI (uvicorn on `127.0.0.1:8001`) serves `/swf-monitor/mcp/` only
0253
0254 Key directives (abridged — see `apache-swf-monitor.conf` for the full file):
0255
0256 ```apache
0257 # WSGI tuning — threads absorb bursty concurrency; listen-backlog absorbs retry
0258 # bursts; queue/inactivity/graceful timeouts bound failure modes. No
0259 # request-timeout because it would truncate /api/messages/stream/ SSE long-poll.
0260 WSGIDaemonProcess swf-monitor \
0261 python-path=/opt/swf-monitor/current/src:/opt/swf-monitor/current/.venv/lib/python3.11/site-packages \
0262 python-home=/opt/swf-monitor/current/.venv \
0263 processes=1 threads=30 \
0264 listen-backlog=500 queue-timeout=30 \
0265 inactivity-timeout=300 graceful-timeout=15 \
0266 display-name=%{GROUP} lang='en_US.UTF-8' locale='en_US.UTF-8'
0267
0268 SetEnv SWF_HOME /opt/swf-monitor
0269
0270 # MCP on ASGI worker — streaming-safe proxy settings.
0271 # Must appear BEFORE WSGIScriptAlias so the proxy takes precedence for /mcp/.
0272 <Location /swf-monitor/mcp/>
0273 ProxyPass http://127.0.0.1:8001/swf-monitor/mcp/ timeout=3600 keepalive=On disablereuse=On
0274 ProxyPassReverse http://127.0.0.1:8001/swf-monitor/mcp/
0275 SetEnv proxy-sendchunked 1
0276 SetEnv no-gzip 1
0277 RequestHeader set X-Forwarded-Proto "https"
0278 CacheDisable on
0279 </Location>
0280
0281 WSGIScriptAlias /swf-monitor /opt/swf-monitor/current/src/swf_monitor_project/wsgi.py process-group=swf-monitor
0282 WSGIPassAuthorization On
0283
0284 Alias /swf-monitor/static /opt/swf-monitor/shared/static
0285
0286 <Location /swf-monitor>
0287 Header always set X-Content-Type-Options nosniff
0288 Header always set X-Frame-Options DENY
0289 Header always set X-XSS-Protection "1; mode=block"
0290 </Location>
0291 ```
0292
0293 **LoadModule** is in a separate file (`/etc/httpd/conf.modules.d/20-swf-monitor-wsgi.conf`) generated by `setup-apache-deployment.sh` at bootstrap and re-installed from the repo's `config/apache/` on every deploy. The prefix `conf.modules.d/` (vs `conf.d/`) matters — Apache loads that directory first, so the module is available when `WSGIDaemonProcess` is parsed.
0294
0295 ## Service Management
0296
0297 ### Apache Control
0298
0299 ```bash
0300 # Reload configuration (for deployments)
0301 sudo systemctl reload httpd
0302
0303 # Restart Apache (for configuration changes)
0304 sudo systemctl restart httpd
0305
0306 # Check Apache status
0307 sudo systemctl status httpd
0308
0309 # View Apache logs
0310 sudo tail -f /var/log/httpd/error_log
0311 sudo tail -f /var/log/httpd/access_log
0312 ```
0313
0314 ### ASGI Worker (MCP endpoint)
0315
0316 `/swf-monitor/mcp/` is served by `swf-monitor-mcp-asgi.service`, a uvicorn ASGI worker bound to `127.0.0.1:8001`. Apache ProxyPasses to it.
0317
0318 ```bash
0319 # Restart (picks up new Python code — uvicorn does not re-read files)
0320 sudo systemctl restart swf-monitor-mcp-asgi.service
0321
0322 # Status
0323 sudo systemctl status swf-monitor-mcp-asgi.service
0324
0325 # Logs
0326 sudo journalctl -u swf-monitor-mcp-asgi.service -f
0327 ```
0328
0329 The deploy script restarts this unit on every deploy; manual restart is only needed for targeted code changes or recovery from a crash-loop.
0330
0331 ### Mattermost Bots
0332
0333 ```bash
0334 sudo systemctl restart swf-panda-bot.service
0335 sudo systemctl restart swf-testbed-bot.service
0336 sudo journalctl -u swf-panda-bot.service -f
0337 ```
0338
0339 ### Application Logs
0340
0341 ```bash
0342 # SWF Monitor application logs
0343 sudo tail -f /opt/swf-monitor/shared/logs/swf-monitor.log
0344
0345 # Django debug logs (if DEBUG=True)
0346 sudo tail -f /opt/swf-monitor/current/src/debug.log
0347 ```
0348
0349
0350 ## Troubleshooting
0351
0352 ### Common Issues
0353
0354 **1. Apache won't start:**
0355 ```bash
0356 # Check Apache configuration
0357 sudo httpd -t
0358
0359 # Check mod_wsgi module load
0360 sudo httpd -M | grep wsgi
0361 ```
0362
0363 **2. Python module errors:**
0364 ```bash
0365 # Verify virtual environment
0366 ls -la /opt/swf-monitor/current/.venv/
0367
0368 # Check Python path in Apache error log
0369 sudo tail -f /var/log/httpd/error_log
0370 ```
0371
0372 **3. Database connection errors:**
0373 ```bash
0374 # Test database connectivity from production.env values
0375 # Check production.env configuration
0376 sudo cat /opt/swf-monitor/config/env/production.env
0377 ```
0378
0379 **4. Static files not loading:**
0380 ```bash
0381 # Check static files location
0382 ls -la /opt/swf-monitor/shared/static/
0383
0384 # Recollect static files
0385 cd /opt/swf-monitor/current/src
0386 sudo python manage.py collectstatic --clear --noinput
0387 ```
0388
0389 **5. Permission errors:**
0390 ```bash
0391 # Fix ownership (adjust user:group as needed)
0392 sudo chown -R [user]:[group] /opt/swf-monitor/
0393
0394 # Fix Apache static file permissions
0395 sudo chmod -R 755 /opt/swf-monitor/shared/static/
0396 ```
0397
0398 ### Diagnostic Commands
0399
0400 ```bash
0401 # Check deployment status
0402 ls -la /opt/swf-monitor/current
0403 readlink /opt/swf-monitor/current
0404
0405 # Check Apache mod_wsgi status
0406 sudo httpd -M | grep wsgi
0407
0408 # Test application directly
0409 cd /opt/swf-monitor/current/src
0410 source /opt/swf-monitor/current/.venv/bin/activate
0411 python manage.py check --deploy
0412
0413 # Check all services
0414 sudo systemctl status httpd swf-monitor-mcp-asgi swf-panda-bot swf-testbed-bot postgresql-16 artemis redis
0415 ```
0416
0417 ## Security Considerations
0418
0419 ### Production Checklist
0420
0421 - [ ] `DEBUG=False` in production.env
0422 - [ ] `SWF_ALLOWED_HOSTS` properly configured
0423 - [ ] Strong `SECRET_KEY` set
0424 - [ ] Database credentials secured
0425 - [ ] SSL certificates properly configured
0426 - [ ] File permissions properly set (755 for directories, 644 for files)
0427 - [ ] Production.env file permissions restrictive (600)
0428
0429 ### SSL/TLS Configuration
0430
0431 The SWF Monitor works with the existing Apache SSL setup. SSL configuration is handled by the system's ssl.conf file, not the swf-monitor specific configuration.
0432
0433 ### File Permissions
0434
0435 ```bash
0436 # Correct ownership (adjust user:group as needed)
0437 sudo chown -R [user]:[group] /opt/swf-monitor/
0438
0439 # Secure environment file
0440 sudo chmod 600 /opt/swf-monitor/config/env/production.env
0441
0442 # Apache-accessible static files
0443 sudo chmod -R 755 /opt/swf-monitor/shared/static/
0444 ```
0445
0446 ## Monitoring and Maintenance
0447
0448 ### Regular Maintenance
0449
0450 1. **Monitor disk space** in `/opt/swf-monitor/releases/` (automatic cleanup keeps 5 releases)
0451 2. **Check Apache error logs** regularly
0452 3. **Monitor database growth** and performance
0453 4. **Update SSL certificates** when needed
0454 5. **Keep development environment updated** (deployment copies .venv from dev)
0455
0456 ### Performance Monitoring
0457
0458 ```bash
0459 # Check Apache processes
0460 ps aux | grep httpd
0461
0462 # Monitor database connections
0463 # Use appropriate database monitoring commands for your setup
0464
0465 # Check system resources
0466 htop
0467 df -h /opt/
0468 ```
0469
0470 ## Development Environment Impact
0471
0472 **Important:** The production deployment system copies the virtual environment from your development setup.
0473
0474 This means:
0475 - Keep your development environment updated with production requirements
0476 - Test thoroughly in development before deploying
0477 - Development and production use the same Python packages and versions
0478 - Your development environment remains unchanged and accessible
0479
0480 ## Support and Documentation
0481
0482 - **Main Documentation**: [swf-monitor README](../README.md)
0483 - **Development Guide**: [SETUP_GUIDE.md](SETUP_GUIDE.md)
0484 - **API Documentation**: [API_REFERENCE.md](API_REFERENCE.md)
0485 - **Parent Project**: [swf-testbed documentation](../../swf-testbed/README.md)
0486
0487 ---
0488
0489 *For urgent production issues, check Apache error logs first: `sudo tail -f /var/log/httpd/error_log`*