Back to home page

EIC code displayed by LXR

 
 

    


Warning, /swf-monitor/docs/PRODUCTION_DEPLOYMENT.md is written in an unsupported language. File is not indexed.

0001 # Production Deployment Guide
0002 
0003 **Complete guide for deploying SWF Monitor to production Apache environment on pandaserver02.**
0004 
0005 ## Overview
0006 
0007 This guide covers the complete production deployment process for the SWF Monitor Django application, including initial infrastructure setup and ongoing deployment updates. The production environment uses Apache with Python 3.11 mod_wsgi to serve the Django application.
0008 
0009 Note on usage modes:
0010 - Standalone development mode runs local services under supervisord for convenience.
0011 - Production platform mode relies on central, system-managed services (PostgreSQL, ActiveMQ, Redis, Apache).
0012 Use the system status reporter to see which mode your host supports and which services are available:
0013 
0014 ```bash
0015 python /eic/u/wenauseic/github/swf-testbed/report_system_status.py
0016 ```
0017 
0018 ## Architecture
0019 
0020 **Production Structure:**
0021 ```
0022 /opt/swf-monitor/
0023 ├── releases/           # Versioned deployments
0024 │   ├── branch-main/
0025 │   └── branch-infra-baseline-vX/
0026 ├── current/           # Symlink to active release
0027 ├── shared/
0028 │   ├── logs/          # Application logs
0029 │   ├── static/        # Django static files
0030 │   └── uploads/       # File uploads
0031 ├── config/
0032 │   ├── apache/        # Apache configuration
0033 │   └── env/
0034 │       └── production.env  # Production environment variables
0035 └── bin/
0036     └── deploy-swf-monitor.sh  # Deployment automation
0037 ```
0038 
0039 **Key Components:**
0040 - **Apache HTTP Server**: Serves static files, proxies Django via mod_wsgi for most paths, and ProxyPasses `/swf-monitor/mcp/` to the ASGI worker
0041 - **Python 3.11 mod_wsgi**: WSGI interface for Django application (all paths except `/mcp/`)
0042 - **ASGI worker (uvicorn)**: `swf-monitor-mcp-asgi.service` on `127.0.0.1:8001` serves `/swf-monitor/mcp/`. Streaming MCP (StreamableHTTPSessionManager) holds a thread per session; under WSGI that saturates the pool. The ASGI worker isolates that failure mode from the rest of the app.
0043 - **PostgreSQL**: Production database (system-managed)
0044 - **ActiveMQ**: Message broker (system-managed via artemis.service)
0045 - **Redis (Channels layer)**: Required inter-process relay used by the SSE forwarder. Redis/Channels-backed SSE is an integral part of the system whenever remote ActiveMQ client recipients are supported.
0046 - **Mattermost bots**: `swf-panda-bot.service` and `swf-testbed-bot.service` — Claude-backed chatbots for `#pandabot` and `#testbed-bot` channels
0047 - **Release Management**: Automated deployment with Apache-conf sync and ASGI-worker recycle
0048 
0049 ## Prerequisites
0050 
0051 Before starting production deployment, ensure:
0052 
0053 1. **System Services Running:**
0054    - PostgreSQL (postgresql-16.service or equivalent)
0055    - ActiveMQ/Artemis (artemis.service)
0056    - Apache HTTP Server (httpd.service)
0057    - Redis (redis.service) — Required for SSE relay via Django Channels. This is integral to production operation to support remote recipients of ActiveMQ events over HTTPS (SSE).
0058 
0059 2. **Development Environment Ready:**
0060    - SWF testbed development environment set up (see [Development Environment Setup](#development-environment-setup) below)
0061    - Virtual environment with all dependencies installed
0062    - All repositories updated to desired branch/tag
0063 
0064 3. **System Access:**
0065    - Root access (sudo) for Apache configuration and deployment
0066    - Database credentials for production PostgreSQL instance
0067 
0068 ### Development Environment Setup
0069 
0070 To set up an equivalent development environment in any user account:
0071 
0072 1. **Clone all repositories as siblings:**
0073    ```bash
0074    cd /path/to/your/workspace
0075    git clone https://github.com/BNLNPPS/swf-testbed.git
0076    git clone https://github.com/BNLNPPS/swf-monitor.git
0077    git clone https://github.com/BNLNPPS/swf-common-lib.git
0078    # Clone other swf-* agent repositories as needed
0079    ```
0080 
0081 2. **Set up the testbed environment:**
0082    ```bash
0083    cd swf-testbed
0084    source install.sh  # Creates .venv and installs all dependencies
0085    ```
0086 
0087 3. **Configure environment variables:**
0088    ```bash
0089    # Copy and customize environment template
0090    cp ../swf-monitor/.env.example ~/.env
0091    # Edit ~/.env with your specific configuration
0092    ```
0093 
0094 4. **Set up database (if using local PostgreSQL):**
0095    ```bash
0096    cd swf-monitor/src
0097    source /path/to/swf-testbed/.venv/bin/activate
0098    python manage.py migrate
0099    python manage.py createsuperuser
0100    ```
0101 
0102 The production deployment will copy the virtual environment from your development setup, so ensure all required packages are installed in your development `.venv`.
0103 
0104 ## Initial Production Setup
0105 
0106 **⚠️ This setup is performed ONCE when initially installing the production environment.**
0107 
0108 ### Step 1: Run Apache Deployment Setup
0109 
0110 ```bash
0111 # From the swf-monitor repository root
0112 sudo ./setup-apache-deployment.sh
0113 ```
0114 
0115 This automated setup script:
0116 
0117 1. **Creates deployment structure** at `/opt/swf-monitor/`
0118 2. **Installs Apache development headers** (httpd-devel)
0119 3. **Compiles Python 3.11 mod_wsgi** in the project virtual environment
0120 4. **Disables system mod_wsgi** to avoid Python version conflicts
0121 5. **Generates LoadModule config** into `/etc/httpd/conf.modules.d/20-swf-monitor-wsgi.conf` (LoadModule + WSGIPythonHome only — loads before `conf.d/` so the module is available when WSGIDaemonProcess is parsed)
0122 6. **Installs the Apache vhost config** by copying the repo canonical `apache-swf-monitor.conf` to `/etc/httpd/conf.d/swf-monitor.conf`. The repo file is the source of truth; `deploy-swf-monitor.sh` keeps live in sync with it on every deploy.
0123 7. **Copies deployment automation** script to `/opt/swf-monitor/bin/`
0124 8. **Creates production environment** template
0125 9. **Tests Apache configuration** (`httpd -t`) and restarts Apache
0126 
0127 Note: you must also install the `swf-monitor-mcp-asgi.service` systemd unit from the repo (not automated by this script; one-time bootstrap per host):
0128 
0129 ```bash
0130 sudo install -o root -g root -m 644 swf-monitor-mcp-asgi.service /etc/systemd/system/
0131 sudo systemctl daemon-reload
0132 sudo systemctl enable --now swf-monitor-mcp-asgi.service
0133 ```
0134 
0135 ### Step 2: Configure Production Environment
0136 
0137 Review and update the production environment configuration:
0138 
0139 ```bash
0140 sudo nano /opt/swf-monitor/config/env/production.env
0141 ```
0142 
0143 **Key Configuration Categories:**
0144 
0145 - **Security Settings**: `DEBUG=False`, `SECRET_KEY`, `SWF_ALLOWED_HOSTS`
0146 - **Host Configuration**: URLs and hostnames for the production server
0147 - **Database Configuration**: PostgreSQL connection details
0148 - **ActiveMQ Configuration**: Message broker settings and SSL certificates
0149 - **Redis/Channels Configuration**: `REDIS_URL` for Channels channel layer powering SSE relay. Required for remote ActiveMQ recipients; without it, only single-process dev streaming works and is not suitable for production.
0150 - **API Authentication**: Tokens for agent authentication
0151 - **Proxy Settings**: Network proxy configuration
0152 
0153 **Note**: The production.env file contains sensitive configuration values that must be customized for your environment.
0154 
0155 **⚠️ IMPORTANT: .env is NOT deployed from git.** The `.env` file is in `.gitignore` for security reasons. The deploy script symlinks the release `.env` to `/opt/swf-monitor/config/env/production.env`. When you need to change environment settings (e.g., `ACTIVEMQ_HEARTBEAT_TOPIC`), you must edit the production.env file directly:
0156 
0157 ```bash
0158 sudo nano /opt/swf-monitor/config/env/production.env
0159 # Then restart Apache to pick up changes:
0160 sudo systemctl restart httpd
0161 ```
0162 
0163 ### Step 3: Deploy First Release
0164 
0165 Deploy your first production release:
0166 
0167 ```bash
0168 # Deploy main branch
0169 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh branch main
0170 
0171 # OR deploy specific infrastructure branch
0172 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh branch infra/baseline-v18
0173 ```
0174 
0175 ### Step 4: Verify Deployment
0176 
0177 Test the production deployment:
0178 
0179 ```bash
0180 # Test HTTP access
0181 curl https://pandaserver02.sdcc.bnl.gov/swf-monitor/
0182 
0183 # Check Apache status
0184 systemctl status httpd
0185 
0186 # Check deployment status
0187 ls -la /opt/swf-monitor/current
0188 ```
0189 
0190 ## Ongoing Deployment Updates
0191 
0192 **For regular updates to the production environment.**
0193 
0194 ### Standard Update Process
0195 
0196 When your repositories are ready for production update:
0197 
0198 ```bash
0199 # Deploy main branch (most common)
0200 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh branch main
0201 
0202 # Deploy specific branch
0203 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh branch infra/baseline-v19
0204 
0205 # Deploy specific tag
0206 sudo /opt/swf-monitor/bin/deploy-swf-monitor.sh tag v1.2.3
0207 ```
0208 
0209 ### What Happens During Deployment
0210 
0211 The deployment script automatically:
0212 
0213 1. **Validates** the branch/tag exists in GitHub repository
0214 2. **Creates** new release directory: `/opt/swf-monitor/releases/branch-main/`
0215 3. **Clones** the specified Git reference to the release directory
0216 4. **Copies** development virtual environment from the configured development path
0217 5. **Links** shared resources (logs, production.env, SSL certificates) and ensures the shared HuggingFace cache exists with open perms
0218 6. **Installs** WSGI LoadModule config from release's `config/apache/20-swf-monitor-wsgi.conf` into `/etc/httpd/conf.modules.d/`
0219 7. **Collects** Django static files with `python manage.py collectstatic`
0220 8. **Syncs** static files to shared Apache location
0221 9. **Runs** database migrations with `python manage.py migrate`
0222 10. **Updates** current symlink to point to new release, sets proper ownership
0223 11. **Syncs Apache vhost conf** — compares release's `apache-swf-monitor.conf` with live `/etc/httpd/conf.d/swf-monitor.conf`; if different, timestamped backup + install + `httpd -t` validates, rollback on failure
0224 12. **Reloads** Apache (`systemctl reload httpd`) — required every deploy to recycle mod_wsgi daemon processes so they pick up new Python code; any conf change from step 11 rides along on the same reload
0225 13. **Restarts** the ASGI worker (`systemctl restart swf-monitor-mcp-asgi.service`) so uvicorn picks up new code (uvicorn loads code once at startup and does not re-read on file change)
0226 14. **Conditionally restarts bots** (`swf-panda-bot`, `swf-testbed-bot`) — only if bot-specific code changed relative to the previous release
0227 15. **Health-checks** the deployment by hitting `/swf-monitor/api/`
0228 16. **Cleans up** old releases (keeps last 5)
0229 
0230 
0231 ### Deployment Output
0232 
0233 Successful deployment shows:
0234 
0235 ```
0236 [2025-01-13 14:30:15] Deployment completed successfully!
0237 [2025-01-13 14:30:15] Active release: branch-main
0238 [2025-01-13 14:30:15] Git commit: a1b2c3d
0239 
0240 Current deployment status:
0241   Release: branch-main
0242   Path: /opt/swf-monitor/releases/branch-main
0243   Current: /opt/swf-monitor/releases/branch-main
0244 ```
0245 
0246 ## Apache Configuration
0247 
0248 **Source of truth:** `apache-swf-monitor.conf` in the repo root. The deploy script copies it to `/etc/httpd/conf.d/swf-monitor.conf` on every release whenever it differs from live (with `httpd -t` validation + rollback on failure). Editing the live file directly is safe for emergency triage, but any deploy will re-install the repo canonical — so durable changes belong in the repo file.
0249 
0250 **Two-backend layout:**
0251 - mod_wsgi (`WSGIDaemonProcess swf-monitor`) serves `/swf-monitor/*` **except** `/mcp/`
0252 - mod_proxy → ASGI (uvicorn on `127.0.0.1:8001`) serves `/swf-monitor/mcp/` only
0253 
0254 Key directives (abridged — see `apache-swf-monitor.conf` for the full file):
0255 
0256 ```apache
0257 # WSGI tuning — threads absorb bursty concurrency; listen-backlog absorbs retry
0258 # bursts; queue/inactivity/graceful timeouts bound failure modes. No
0259 # request-timeout because it would truncate /api/messages/stream/ SSE long-poll.
0260 WSGIDaemonProcess swf-monitor \
0261     python-path=/opt/swf-monitor/current/src:/opt/swf-monitor/current/.venv/lib/python3.11/site-packages \
0262     python-home=/opt/swf-monitor/current/.venv \
0263     processes=1 threads=30 \
0264     listen-backlog=500 queue-timeout=30 \
0265     inactivity-timeout=300 graceful-timeout=15 \
0266     display-name=%{GROUP} lang='en_US.UTF-8' locale='en_US.UTF-8'
0267 
0268 SetEnv SWF_HOME /opt/swf-monitor
0269 
0270 # MCP on ASGI worker — streaming-safe proxy settings.
0271 # Must appear BEFORE WSGIScriptAlias so the proxy takes precedence for /mcp/.
0272 <Location /swf-monitor/mcp/>
0273     ProxyPass         http://127.0.0.1:8001/swf-monitor/mcp/ timeout=3600 keepalive=On disablereuse=On
0274     ProxyPassReverse  http://127.0.0.1:8001/swf-monitor/mcp/
0275     SetEnv proxy-sendchunked 1
0276     SetEnv no-gzip 1
0277     RequestHeader set X-Forwarded-Proto "https"
0278     CacheDisable on
0279 </Location>
0280 
0281 WSGIScriptAlias /swf-monitor /opt/swf-monitor/current/src/swf_monitor_project/wsgi.py process-group=swf-monitor
0282 WSGIPassAuthorization On
0283 
0284 Alias /swf-monitor/static /opt/swf-monitor/shared/static
0285 
0286 <Location /swf-monitor>
0287     Header always set X-Content-Type-Options nosniff
0288     Header always set X-Frame-Options DENY
0289     Header always set X-XSS-Protection "1; mode=block"
0290 </Location>
0291 ```
0292 
0293 **LoadModule** is in a separate file (`/etc/httpd/conf.modules.d/20-swf-monitor-wsgi.conf`) generated by `setup-apache-deployment.sh` at bootstrap and re-installed from the repo's `config/apache/` on every deploy. The prefix `conf.modules.d/` (vs `conf.d/`) matters — Apache loads that directory first, so the module is available when `WSGIDaemonProcess` is parsed.
0294 
0295 ## Service Management
0296 
0297 ### Apache Control
0298 
0299 ```bash
0300 # Reload configuration (for deployments)
0301 sudo systemctl reload httpd
0302 
0303 # Restart Apache (for configuration changes)
0304 sudo systemctl restart httpd
0305 
0306 # Check Apache status
0307 sudo systemctl status httpd
0308 
0309 # View Apache logs
0310 sudo tail -f /var/log/httpd/error_log
0311 sudo tail -f /var/log/httpd/access_log
0312 ```
0313 
0314 ### ASGI Worker (MCP endpoint)
0315 
0316 `/swf-monitor/mcp/` is served by `swf-monitor-mcp-asgi.service`, a uvicorn ASGI worker bound to `127.0.0.1:8001`. Apache ProxyPasses to it.
0317 
0318 ```bash
0319 # Restart (picks up new Python code — uvicorn does not re-read files)
0320 sudo systemctl restart swf-monitor-mcp-asgi.service
0321 
0322 # Status
0323 sudo systemctl status swf-monitor-mcp-asgi.service
0324 
0325 # Logs
0326 sudo journalctl -u swf-monitor-mcp-asgi.service -f
0327 ```
0328 
0329 The deploy script restarts this unit on every deploy; manual restart is only needed for targeted code changes or recovery from a crash-loop.
0330 
0331 ### Mattermost Bots
0332 
0333 ```bash
0334 sudo systemctl restart swf-panda-bot.service
0335 sudo systemctl restart swf-testbed-bot.service
0336 sudo journalctl -u swf-panda-bot.service -f
0337 ```
0338 
0339 ### Application Logs
0340 
0341 ```bash
0342 # SWF Monitor application logs
0343 sudo tail -f /opt/swf-monitor/shared/logs/swf-monitor.log
0344 
0345 # Django debug logs (if DEBUG=True)
0346 sudo tail -f /opt/swf-monitor/current/src/debug.log
0347 ```
0348 
0349 
0350 ## Troubleshooting
0351 
0352 ### Common Issues
0353 
0354 **1. Apache won't start:**
0355 ```bash
0356 # Check Apache configuration
0357 sudo httpd -t
0358 
0359 # Check mod_wsgi module load
0360 sudo httpd -M | grep wsgi
0361 ```
0362 
0363 **2. Python module errors:**
0364 ```bash
0365 # Verify virtual environment
0366 ls -la /opt/swf-monitor/current/.venv/
0367 
0368 # Check Python path in Apache error log
0369 sudo tail -f /var/log/httpd/error_log
0370 ```
0371 
0372 **3. Database connection errors:**
0373 ```bash
0374 # Test database connectivity from production.env values
0375 # Check production.env configuration
0376 sudo cat /opt/swf-monitor/config/env/production.env
0377 ```
0378 
0379 **4. Static files not loading:**
0380 ```bash
0381 # Check static files location
0382 ls -la /opt/swf-monitor/shared/static/
0383 
0384 # Recollect static files
0385 cd /opt/swf-monitor/current/src
0386 sudo python manage.py collectstatic --clear --noinput
0387 ```
0388 
0389 **5. Permission errors:**
0390 ```bash
0391 # Fix ownership (adjust user:group as needed)
0392 sudo chown -R [user]:[group] /opt/swf-monitor/
0393 
0394 # Fix Apache static file permissions
0395 sudo chmod -R 755 /opt/swf-monitor/shared/static/
0396 ```
0397 
0398 ### Diagnostic Commands
0399 
0400 ```bash
0401 # Check deployment status
0402 ls -la /opt/swf-monitor/current
0403 readlink /opt/swf-monitor/current
0404 
0405 # Check Apache mod_wsgi status
0406 sudo httpd -M | grep wsgi
0407 
0408 # Test application directly
0409 cd /opt/swf-monitor/current/src
0410 source /opt/swf-monitor/current/.venv/bin/activate
0411 python manage.py check --deploy
0412 
0413 # Check all services
0414 sudo systemctl status httpd swf-monitor-mcp-asgi swf-panda-bot swf-testbed-bot postgresql-16 artemis redis
0415 ```
0416 
0417 ## Security Considerations
0418 
0419 ### Production Checklist
0420 
0421 - [ ] `DEBUG=False` in production.env
0422 - [ ] `SWF_ALLOWED_HOSTS` properly configured
0423 - [ ] Strong `SECRET_KEY` set
0424 - [ ] Database credentials secured
0425 - [ ] SSL certificates properly configured
0426 - [ ] File permissions properly set (755 for directories, 644 for files)
0427 - [ ] Production.env file permissions restrictive (600)
0428 
0429 ### SSL/TLS Configuration
0430 
0431 The SWF Monitor works with the existing Apache SSL setup. SSL configuration is handled by the system's ssl.conf file, not the swf-monitor specific configuration.
0432 
0433 ### File Permissions
0434 
0435 ```bash
0436 # Correct ownership (adjust user:group as needed)
0437 sudo chown -R [user]:[group] /opt/swf-monitor/
0438 
0439 # Secure environment file
0440 sudo chmod 600 /opt/swf-monitor/config/env/production.env
0441 
0442 # Apache-accessible static files
0443 sudo chmod -R 755 /opt/swf-monitor/shared/static/
0444 ```
0445 
0446 ## Monitoring and Maintenance
0447 
0448 ### Regular Maintenance
0449 
0450 1. **Monitor disk space** in `/opt/swf-monitor/releases/` (automatic cleanup keeps 5 releases)
0451 2. **Check Apache error logs** regularly
0452 3. **Monitor database growth** and performance
0453 4. **Update SSL certificates** when needed
0454 5. **Keep development environment updated** (deployment copies .venv from dev)
0455 
0456 ### Performance Monitoring
0457 
0458 ```bash
0459 # Check Apache processes
0460 ps aux | grep httpd
0461 
0462 # Monitor database connections
0463 # Use appropriate database monitoring commands for your setup
0464 
0465 # Check system resources
0466 htop
0467 df -h /opt/
0468 ```
0469 
0470 ## Development Environment Impact
0471 
0472 **Important:** The production deployment system copies the virtual environment from your development setup.
0473 
0474 This means:
0475 - Keep your development environment updated with production requirements
0476 - Test thoroughly in development before deploying
0477 - Development and production use the same Python packages and versions
0478 - Your development environment remains unchanged and accessible
0479 
0480 ## Support and Documentation
0481 
0482 - **Main Documentation**: [swf-monitor README](../README.md)
0483 - **Development Guide**: [SETUP_GUIDE.md](SETUP_GUIDE.md)
0484 - **API Documentation**: [API_REFERENCE.md](API_REFERENCE.md)
0485 - **Parent Project**: [swf-testbed documentation](../../swf-testbed/README.md)
0486 
0487 ---
0488 
0489 *For urgent production issues, check Apache error logs first: `sudo tail -f /var/log/httpd/error_log`*