Back to home page

EIC code displayed by LXR

 
 

    


Warning, /sphenixprod/CLAUDE.md is written in an unsupported language. File is not indexed.

0001 # CLAUDE.md
0002 
0003 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
0004 
0005 ## Overview
0006 
0007 `sphenixprod` is the production toolchain for the sPHENIX experiment at BNL. It manages large-scale physics data processing by submitting HTCondor jobs, querying PostgreSQL databases for input file matching, and tracking job state across two databases (Production DB and FileCatalog).
0008 
0009 ## Environment Setup
0010 
0011 ```bash
0012 source sphenixprod/this_sphenixprod.sh
0013 ```
0014 
0015 This sources `/opt/sphenix/core/bin/sphenix_setup.sh`, adds scripts to `PATH`, sets `PYTHONPATH`, and sets `ODBCINI=./.odbc.ini`. The `.odbc.ini` file must exist in the working directory for database connections to work.
0016 
0017 ## Key Commands
0018 
0019 **Dry-run submission (no jobs submitted, no DB writes):**
0020 ```bash
0021 create_submission.py --config ProdFlow/short/run3auau/v001_combining_run3_new_nocdbtag.yaml \
0022   --rule DST_TRIGGERED_EVENT_run3physics --runs 69600 72000 -n -vv
0023 ```
0024 
0025 **Real submission:**
0026 ```bash
0027 create_submission.py --config <config.yaml> --rule <RULENAME> --runs <START> <END> --andgo -vv
0028 ```
0029 
0030 **Chunked submission (faster feedback for large run lists):**
0031 ```bash
0032 create_submission.py --config <config.yaml> --rule <RULENAME> --runs <START> <END> \
0033   --chunk-size 100 --andgo -vv
0034 ```
0035 
0036 **Autopilot (cron-driven, per-host dispatch):**
0037 ```bash
0038 production_control.py --steerfile <steer.yaml>
0039 ```
0040 
0041 **Install dependencies:**
0042 ```bash
0043 pip install -r requirements.txt
0044 ```
0045 
0046 ## Architecture
0047 
0048 ### Pipeline flow
0049 
0050 `create_submission.py` is the main entry point. It:
0051 1. Parses a YAML config file into a `RuleConfig` (via `sphenixprodrules.py`)
0052 2. Builds a `MatchConfig` (via `sphenixmatching.py`) which queries the databases to find input files for the given run range
0053 3. Creates `CondorJob` objects (via `sphenixcondorjobs.py`) and writes HTCondor submit files
0054 4. Optionally submits via `execute_condorsubmission.py` (with `--andgo`)
0055 
0056 ### Autopilot flow
0057 
0058 `production_control.py` is intended for cron jobs. It reads a steer YAML keyed by hostname, and for each rule calls `create_submission.py`, `dstspider.py`, `histspider.py`, and `monitor_finish.py` as configured. `dispatch_productions.py` runs multiple `production_control.py` instances from a steer-list file with configurable stagger.
0059 
0060 ### Core library modules
0061 
0062 | Module | Role |
0063 |---|---|
0064 | `sphenixprodrules.py` | `RuleConfig` dataclass: parses YAML rule/job/input config; filesystem path templates |
0065 | `sphenixmatching.py` | `MatchConfig`: DB queries to match input files for a run range |
0066 | `sphenixcondorjobs.py` | `CondorJobConfig` / `CondorJob`: maps to HTCondor submit parameters |
0067 | `sphenixdbutils.py` | Interface to Production DB and FileCatalog via pyodbc; determines test vs. prod mode |
0068 | `sphenixjobdicts.py` | Dictionaries mapping DST output types to their required input types |
0069 | `sphenixmisc.py` | Utilities: `shell_command`, lock/unlock files, rotating log setup |
0070 | `simpleLogger.py` | Custom logger with levels: CHATTY < DEBUG < INFO < WARN < ERROR < CRITICAL |
0071 | `argparsing.py` | Shared argument parsing (`--runs`, `--runlist`, `--dryrun`, verbosity flags, etc.) |
0072 
0073 ### Configuration (YAML)
0074 
0075 Production rules live in `ProdFlow/` subdirectories. Each YAML file defines one or more named rules. A rule specifies input DST types, output DST type, build/dbtag/version triplet, resource requests, and filesystem overrides. The steer files for `production_control.py` are also YAML, keyed by hostname, with `submit`/`dstspider`/`histspider`/`finishmon` flags per rule.
0076 
0077 ### Test vs. production mode
0078 
0079 `sphenixdbutils.py` activates **test mode** if:
0080 - The current directory path contains `testbed`
0081 - A `.testbed` file exists
0082 - A `SPHNX_TESTBED_MODE` file exists
0083 
0084 **Production mode** is activated by a `SPHNX_PRODUCTION_MODE` file (changes DB DSN).
0085 
0086 ### Databases
0087 
0088 - **Production DB**: `sphnxproddbmaster.sdcc.bnl.gov`, database `Production`, table `production_status`
0089 - **FileCatalog**: `sphnxdbmaster.sdcc.bnl.gov`, database `FileCatalog`, tables `files` and `datasets`
0090 
0091 Access requires pyodbc and a valid `.odbc.ini` in the working directory.
0092 
0093 ### Filesystem layout
0094 
0095 Output files follow the template defined in `sphenixprodrules.py`:
0096 ```
0097 /sphenix/lustre01/sphnxpro/{prodmode}/{period}/{physicsmode}/{outtriplet}/{leafdir}/{rungroup}/
0098 ```
0099 Logs go to `/sphenix/data/data02/sphnxpro/...` and submission logs to `/tmp/sphenixprod/sphenixprod/`.
0100 
0101 ### Post-processing scripts
0102 
0103 - `dstspider.py` — crawls output dirs, registers finished DST files in FileCatalog
0104 - `histspider.py` — same for histogram files
0105 - `monitor_finish.py` — monitors job completion status
0106 - `eradicate_runs.py` — removes runs from DBs and optionally filesystem
0107 - `resubmit_to_condor.py` — resubmits failed jobs
0108 
0109 ### Verbosity flags
0110 
0111 All scripts share `-v` (INFO), `-vv` / `-d` (DEBUG), `-vvv` / `-c` (CHATTY), or `--loglevel LEVEL`. Use `-n` / `--dryrun` to suppress all DB writes and job submissions.