Warning, /swf-monitor/docs/JEDI_INTEGRATION.md is written in an unsupported language. File is not indexed.
0001 # JEDI Integration — Direct Task Submission from PCS
0002
0003 ## Overview
0004
0005 PCS (Physics Configuration System) currently composes physics, event generation, simulation, and reconstruction tags into fully specified production tasks, then generates `prun` CLI commands and Condor submit scripts as text. The next step is to **submit tasks directly to JEDI via the PanDA Python API**, bypassing script generation entirely.
0006
0007 This document describes the integration design: how PCS task parameters map to JEDI's `taskParamMap`, the submission flow, and what infrastructure support is needed from PanDA.
0008
0009 **Approach:** Direct API submission. PCS owns the full task specification. JEDI's existing `GenTaskRefiner` handles the task — no custom server-side plugin required.
0010
0011 ## Architecture
0012
0013 ```
0014 ┌─────────────────────────────────────────────────┐
0015 │ PCS (swf-monitor) │
0016 │ │
0017 │ PhysicsTag ─┐ │
0018 │ EvgenTag ─┼─► Dataset ─┐ │
0019 │ SimuTag ─┘ ├─► ProdTask │
0020 │ RecoTag ─┘ ProdConfig┘ │ │
0021 │ │ │
0022 │ build_task_params(task) │
0023 │ │ │
0024 │ ▼ │
0025 │ taskParamMap (dict) │
0026 │ │ │
0027 │ submit_to_jedi(task) │
0028 │ │ │
0029 └──────────────────────────┼──────────────────────┘
0030 │ Client.insertTaskParams()
0031 ▼
0032 ┌─────────────────────────────────────────────────┐
0033 │ PanDA Server │
0034 │ │
0035 │ POST /api/v1/task/submit │
0036 │ │ │
0037 │ ▼ │
0038 │ TaskBuffer.insertTaskParamsPanda() │
0039 │ │ stores task in DB, state = "defined" │
0040 │ ▼ │
0041 │ JEDI TaskRefiner daemon │
0042 │ │ selects GenTaskRefiner via VO config │
0043 │ ▼ │
0044 │ GenTaskRefiner.extractCommon() │
0045 │ GenTaskRefiner.doRefine() │
0046 │ │ creates JediTaskSpec + dataset specs │
0047 │ ▼ │
0048 │ ContentsFeeder → JobGenerator → JobBroker │
0049 │ │ breaks task into jobs, assigns sites │
0050 │ ▼ │
0051 │ Jobs dispatched to Pilot │
0052 └─────────────────────────────────────────────────┘
0053 ```
0054
0055 ## PCS-to-JEDI Field Mapping
0056
0057 ### Task Identity
0058
0059 | JEDI Parameter | PCS Source | Notes |
0060 |---------------|-----------|-------|
0061 | `taskName` | `dataset.task_name` | Dataset name without `.bN` block suffix |
0062 | `userName` | `task.created_by` | PCS user who created the task |
0063 | `vo` | `'eic'` | Virtual organization |
0064 | `workingGroup` | `config.panda_working_group` | e.g. `'EIC'` |
0065 | `campaign` | Derived from detector version | e.g. `'26.02.0'` |
0066
0067 ### Processing Definition
0068
0069 | JEDI Parameter | PCS Source | Notes |
0070 |---------------|-----------|-------|
0071 | `prodSourceLabel` | `config.data['prod_source_label']` | `'managed'` for production, `'test'` for testing |
0072 | `taskType` | `'production'` | Fixed for PCS production tasks |
0073 | `processingType` | `config.data['processing_type']` | e.g. `'epicproduction'` |
0074 | `taskPriority` | `config.data` or default | 0-1000, production typically 900 |
0075 | `transPath` | `config.data['transformation']` | Payload executable or TRF URL |
0076 | `transUses` | `''` | Not used for containerized jobs |
0077 | `transHome` | `''` | Not used for containerized jobs |
0078 | `architecture` | `''` | Empty string — container handles platform |
0079 | `container_name` | `config.container_image` | Singularity/Docker image reference |
0080
0081 ### Job Splitting
0082
0083 | JEDI Parameter | PCS Source | Notes |
0084 |---------------|-----------|-------|
0085 | `nEventsPerJob` | `config.data['events_per_job']` | Events per individual job |
0086 | `nEvents` | `config.events_per_task` | Total events for the task |
0087 | `nFiles` | `config.data['n_jobs']` | When using noInput, this controls job count |
0088 | `nFilesPerJob` | `config.data['files_per_job']` | Input files per job (default 1) |
0089 | `noInput` | `True` | MC generation has no input dataset |
0090 | `coreCount` | `config.data['corecount']` | CPU cores per job (default 1) |
0091 | `walltime` | Derived from `config.target_hours_per_job` | In seconds for JEDI |
0092 | `ramCount` | `config.data` or GenTaskRefiner default | MB per core (default 2000) |
0093
0094 ### Site Selection
0095
0096 | JEDI Parameter | PCS Source | Notes |
0097 |---------------|-----------|-------|
0098 | `site` | `config.panda_site` | PanDA queue name, e.g. `'BNL_EPIC_PROD_1'` |
0099 | `cloud` | `config.panda_working_group` or `'US'` | GenTaskRefiner copies workingGroup to cloud |
0100
0101 ### Output Datasets
0102
0103 | JEDI Parameter | PCS Source | Notes |
0104 |---------------|-----------|-------|
0105 | `log` | Built from `dataset.did` | Log dataset template |
0106 | `jobParameters` | Built from config | Execution command + output file templates |
0107
0108 ### Flags
0109
0110 | JEDI Parameter | PCS Source | Notes |
0111 |---------------|-----------|-------|
0112 | `skipScout` | `config.data['skip_scout']` | Skip scout jobs if True |
0113 | `disableAutoRetry` | `config.data` | Optional |
0114 | `useRucio` | `config.use_rucio` | Whether to register outputs in Rucio |
0115
0116 ## Example: taskParamMap Built from PCS
0117
0118 Given a ProdTask with:
0119 - Dataset: `group.EIC.26.02.0.epic_craterlake.p3001.e1.s1.r1` (DIS NC 10x100)
0120 - ProdConfig: container image, 100 events/job, 1000 total events, 1 core
0121
0122 The `build_task_params(task)` function would produce:
0123
0124 ```python
0125 taskParamMap = {
0126 # Identity
0127 'taskName': 'group.EIC.26.02.0.epic_craterlake.p3001.e1.s1.r1',
0128 'userName': 'wenaus',
0129 'vo': 'eic',
0130 'workingGroup': 'EIC',
0131 'campaign': '26.02.0',
0132
0133 # Processing
0134 'prodSourceLabel': 'managed',
0135 'taskType': 'production',
0136 'processingType': 'epicproduction',
0137 'taskPriority': 900,
0138
0139 # Executable (containerized)
0140 'transPath': 'https://pandaserver-doma.cern.ch/trf/user/runGen-00-00-02',
0141 'transUses': '',
0142 'transHome': '',
0143 'architecture': '',
0144 'container_name': 'docker://eicweb/jug_xl:26.02.0-stable',
0145
0146 # Splitting
0147 'noInput': True,
0148 'nFiles': 10, # number of jobs
0149 'nFilesPerJob': 1,
0150 'nEventsPerJob': 100,
0151 'coreCount': 1,
0152 'ramCount': 4000,
0153 'ramUnit': 'MBPerCore',
0154
0155 # Site
0156 'site': 'BNL_EPIC_PROD_1',
0157 'cloud': 'EIC',
0158
0159 # Log output
0160 'log': {
0161 'dataset': 'group.EIC:group.EIC.26.02.0.epic_craterlake.p3001.e1.s1.r1.log',
0162 'type': 'template',
0163 'param_type': 'log',
0164 'token': 'local',
0165 'destination': 'local',
0166 'value': 'group.EIC.26.02.0.epic_craterlake.p3001.e1.s1.r1.log.${SN}.log.tgz',
0167 },
0168
0169 # Job parameters: execution command + output file spec
0170 'jobParameters': [
0171 {
0172 'type': 'constant',
0173 'value': (
0174 'EBEAM=10 PBEAM=100 '
0175 'DETECTOR_VERSION=26.02.0 DETECTOR_CONFIG=epic_craterlake '
0176 'JUG_XL_TAG=26.02.0-stable '
0177 'COPYRECO=true COPYFULL=false COPYLOG=true '
0178 './run.sh'
0179 ),
0180 },
0181 {
0182 'type': 'template',
0183 'param_type': 'output',
0184 'token': 'local',
0185 'destination': 'local',
0186 'dataset': 'group.EIC:group.EIC.26.02.0.epic_craterlake.p3001.e1.s1.r1',
0187 'value': 'group.EIC.26.02.0.epic_craterlake.p3001.e1.s1.r1.${SN}.root',
0188 'offset': 1000,
0189 },
0190 ],
0191 }
0192 ```
0193
0194 ## GenTaskRefiner Behavior
0195
0196 When JEDI processes this task, `GenTaskRefiner` (61 lines, `panda-server/pandajedi/jedirefine/GenTaskRefiner.py`) applies these defaults:
0197
0198 1. **`cloud`** — if absent, copies from `workingGroup` (so `cloud='EIC'` from `workingGroup='EIC'`)
0199 2. **`transPath`** — defaults to `runGen-00-00-02` TRF if not set (we set it explicitly)
0200 3. **`ramCount`** — defaults to 2000 MB if not set
0201 4. **`pushStatusChanges`** — defaults to True (status updates via message queue)
0202 5. **`messageDriven`** — defaults to True
0203 6. **`cloudAsVO`** — always set to True (cloud field used as VO for brokerage)
0204 7. **Dataset templates** — instantiated per-site if DDM interface is available
0205
0206 The `GenJobBroker` then handles site selection using the simplified non-ATLAS brokerage logic: filter by queue status, disk space, walltime constraints, then select.
0207
0208 ## Implementation Plan
0209
0210 ### Phase 1: build_task_params() (commands.py)
0211
0212 Add a new function alongside the existing `build_condor_command()` and `build_panda_command()`:
0213
0214 ```python
0215 def build_task_params(task):
0216 """
0217 Build a JEDI taskParamMap dict from a ProdTask.
0218
0219 Returns the dict that can be passed directly to
0220 pandaclient.Client.insertTaskParams() for JEDI submission.
0221 """
0222 ```
0223
0224 This function reads the same ProdTask → ProdConfig → Dataset → Tags chain but produces a dict instead of a CLI string. The `ProdTask.generate_commands()` method should also call this and store the result (JSON) for review before submission.
0225
0226 ### Phase 2: submit_to_jedi() (new module: pcs/submission.py)
0227
0228 ```python
0229 def submit_to_jedi(task):
0230 """
0231 Submit a ProdTask to JEDI via PanDA API.
0232
0233 Returns (status, jedi_task_id) on success.
0234 Updates task.panda_task_id and task.status.
0235 """
0236 ```
0237
0238 This calls `Client.insertTaskParams(task_params)` and handles the response. Authentication uses OIDC (`PANDA_AUTH=oidc`, `PANDA_AUTH_VO=eic`).
0239
0240 ### Phase 3: UI Integration
0241
0242 - Add a "Submit to JEDI" button on the ProdTask detail page (alongside existing command display)
0243 - Show the taskParamMap as formatted JSON for review before submission
0244 - After submission, display the JEDI task ID with link to ePIC production monitoring
0245 - Status tracking via `Client.getTaskStatus(jedi_task_id)`
0246
0247 ### Phase 4: Task Monitoring
0248
0249 - Poll JEDI task status and update ProdTask.status accordingly
0250 - ePIC prod monitoring views for task and job info, and info via MCP tools
0251 - Surface errors via the existing PanDA MCP tools
0252
0253 ## Submitting from the CLI
0254
0255 Today, `pcs-task-cmd` (documented in [PCS.md](PCS.md)) can emit the `taskParamMap` JSON for any task. Operators with a valid PanDA auth context (x509 proxy or OIDC token) can pipe it straight into `Client.insertTaskParams()`:
0256
0257 ```bash
0258 pcs-task-cmd <task_name> --format jedi | python -c '
0259 import json, sys
0260 from pandaclient import Client
0261 print(Client.insertTaskParams(json.load(sys.stdin)))
0262 '
0263 ```
0264
0265 This is the intended test-phase submission path. Server-side submission from swf-monitor is blocked on the OIDC service account listed below.
0266
0267 ## Infrastructure: What We Know
0268
0269 - **VO**: `eic`
0270 - **Queues**: 13 EIC queues online (BNL_EPIC_PROD_1, BNL_OSG_EPIC_PROD_1, NERSC_Perlmutter_epic, E1_BNL, E1_JLAB, etc.). All support Apptainer containers.
0271 - **Auth**: OIDC with `PANDA_AUTH=oidc`, `PANDA_AUTH_VO=eic`
0272 - **Output**: Rucio integration available; `token='local'` / `destination='local'` for local staging
0273
0274 ## What PanDA Team Needs to Confirm
0275
0276 1. **GenTaskRefiner registration** for `eic:managed` in `panda_jedi.cfg`
0277 2. **OIDC service account** setup for non-interactive programmatic submission from our production server
0278 3. **`transPath`** — is the GenTaskRefiner default TRF appropriate for containerized EIC jobs, or should we specify our own?
0279
0280 ## Key References
0281
0282 ### PanDA Documentation (panda-docs repo)
0283 - [Task Parameters](../../../panda-docs/docs/source/advanced/task_params.rst) — splitRule codes and parameter priority
0284 - [JEDI Architecture](../../../panda-docs/docs/source/architecture/jedi.rst) — task flow, agents, state machines
0285 - [Client API](../../../panda-docs/docs/source/client/panda-client.rst) — Python API setup and usage
0286 - [Admin Guide](../../../panda-docs/docs/source/admin_guide/admin_guide.rst) — GenTaskRefiner config examples
0287
0288 ### PanDA Source Code (cloned in github/)
0289 - `panda-server/pandajedi/jedirefine/GenTaskRefiner.py` — the refiner our tasks will use
0290 - `panda-server/pandajedi/jedirefine/TaskRefinerBase.py` — extractCommon() parameter processing
0291 - `panda-server/pandajedi/jeditest/addNonAtlasTask.py` — non-ATLAS submission example
0292 - `panda-client/pandaclient/example_task.py` — client-side task dict example
0293 - `panda-client/pandaclient/panda_api.py` — `submit_task()` high-level API
0294 - `panda-client/pandaclient/Client.py:1304` — `insertTaskParams()` implementation
0295
0296 ### PCS Source Code (swf-monitor)
0297 - `src/pcs/models.py` — ProdTask, ProdConfig, Dataset, tag models
0298 - `src/pcs/commands.py` — current command generation (to be extended)
0299 - `docs/PCS.md` — PCS documentation