Warning, /swf-monitor/docs/EPICPROD_EVGEN_INPUTS.md is written in an unsupported language. File is not indexed.
0001 # ePIC Production EVGEN Inputs
0002
0003 A production task reconstructs a generator-level event sample. That sample —
0004 the **EVGEN** (event-generation) input — is produced by a physics working group
0005 and registered in Rucio. This document describes where EVGEN inputs live, how
0006 PCS (Physics Configuration System) assimilates the Rucio inventory, how it
0007 resolves each catalog request to the Rucio dataset(s) that realize it, and how
0008 to read the matched and unmatched result. Matching is the part that takes
0009 judgment; most of this document is about it.
0010
0011 ## Where EVGEN inputs live
0012
0013 EVGEN datasets are registered in **JLab Rucio**, scope `epic`, under
0014 `/EVGEN/...`. They are detector-independent — one EVGEN sample feeds any
0015 detector configuration — so the tree carries no detector or campaign-version
0016 segment. The files are HepMC3 (`*.hepmc3.tree.root`) and commonly reside on
0017 tape (`JLAB-TAPE-SE`), so staging them incurs a tape recall.
0018
0019 Read access uses the public `eicread` userpass; no production credential is
0020 needed to list or inspect EVGEN. PanDA does not resolve EVGEN through Rucio at
0021 all: a PanDA server is bound to a single Rucio instance (BNL Rucio for the BNL
0022 server), and the production payload stages its input from JLab Rucio itself. See
0023 [JEDI_INTEGRATION.md](JEDI_INTEGRATION.md) § "Data handling and the single-Rucio
0024 constraint".
0025
0026 ## The two namespaces: request and Rucio
0027
0028 A PCS catalog request names an EVGEN path under
0029 `/volatile/eic/EPIC/EVGEN/<tail>`, taken from the production team's
0030 `default_datasets` catalogue (`eic/epic-prod`, the basis of the PCS task
0031 catalog). The produced sample is registered in Rucio as `epic:/EVGEN/<tail>`.
0032 The `<tail>` is the same namespace on both sides, but the request tail ranges
0033 from abstract to fully specific depending on physics class:
0034
0035 | Class | Request tail | Rucio DID tail |
0036 |-------|--------------|----------------|
0037 | DIS (pythia8) | `DIS/NC/10x100/minQ2=1` | `DIS/pythia8.316-1.0/NC/noRad/ep/10x100/q2_1to10` |
0038 | SIDIS (pythia6) | `SIDIS/pythia6-eic/1.2.0/ep_noradcor/18x275/q2_1to10` | `SIDIS/pythia6-eic/1.1.0/en_noradcor/18x275/q2_1to10` |
0039 | EXCLUSIVE (DEMP) | — | `EXCLUSIVE/DEMP/DEMPgen-1.2.3/10x130/q2_10_20/pi+` |
0040 | DIS (BeAGLE, nuclear) | `DIS/BeAGLE1.03.02-1.0/eH2/10x130` | `DIS/BeAGLE1.03.02-1.0/eAu/5x41/q2_1to10` |
0041
0042 A DIS pythia8 request states only the current type (`NC`/`CC`), beam, and a Q²
0043 floor; the Rucio DID additionally carries generator, radiation, and charge, and
0044 a Q² range rather than a floor. A SIDIS request, by contrast, already carries
0045 generator, version, charge, radiation, and an explicit Q² range. The match must
0046 respect whichever axes a request actually states.
0047
0048 ## Assimilation
0049
0050 `refresh_evgen_rucio` (`src/pcs/services.py`) fetches `epic:/EVGEN/*` once into a
0051 snapshot, resolves each PCS evgen `Dataset` to the Rucio dataset(s) it matches,
0052 and writes the resolved references onto `Dataset.metadata['rucio']`. Re-running
0053 picks up a grown Rucio listing the same way — assimilation is idempotent and
0054 re-sweepable.
0055
0056 - Each `metadata['rucio']` entry records the resolved Rucio `did`, `file_count`,
0057 `bytes`, per-RSE availability, and completeness.
0058 - The standalone runner is `scripts/import_evgen_rucio.py`: a dry run by default
0059 (fetch, match, and report with no database writes) and `--apply` to persist.
0060 The same service backs the catalog's update button (run under the production
0061 operations agent, the same pattern as the produced-output sweep).
0062 - The snapshot is written as one JSON file under the snapshot directory.
0063
0064 ## Matching
0065
0066 A request resolves to a Rucio dataset when the request's path tokens appear, in
0067 order, as a subsequence of the Rucio DID's tokens, compared **exactly except for
0068 the Q² token**. Two consequences follow, and they are the whole point:
0069
0070 - **Exact comparison on every axis the request states.** A request that names a
0071 charge, generator, or version matches only a DID carrying the same value:
0072 `ep` never matches `en`, `pythia6-eic/1.2.0` never matches `1.1.0`. This is
0073 what keeps a specific request off the wrong beam species or generator version.
0074 - **Fan-out for every axis the request omits.** An abstract DIS request states
0075 no generator, radiation, or charge, so it matches every Rucio dataset that
0076 agrees on the axes it does state. One request resolves to several datasets.
0077
0078 ### Q² semantics
0079
0080 The Q² token is the one axis compared by value, not string:
0081
0082 - An explicit request range (`q2_1to10`) matches only the identical Rucio range.
0083 - A Q² floor request (`minQ2=N`) matches every Rucio range lying entirely at or
0084 above the floor. `minQ2=10` resolves to `q2_10to100` and `q2_100to1000`, never
0085 to `q2_1to10` (which would include events below the floor). `minQ2=1` resolves
0086 to all three ranges.
0087
0088 ### Version policy
0089
0090 A requested generator version absent from Rucio is left unmatched and surfaced
0091 as a gap. It is never substituted with a different version.
0092
0093 ### Separate from produced-output matching
0094
0095 Input matching is implemented independently of the produced-output match
0096 (`EPICPROD_DATA_LINEAGE.md`, `_filter_match`/`_q2_overlap`). Output matching
0097 deliberately tolerates the abstract-request-to-specific-output gap and treats Q²
0098 as overlapping; input matching requires exact axes and exact-or-floor Q². The
0099 two policies share no code, so a change to one cannot alter the other.
0100
0101 ## Reading the result
0102
0103 Every assimilation yields three populations, and an operator should be able to
0104 see all three:
0105
0106 - **Matched** — a request resolved to one or more Rucio datasets, recorded on
0107 `Dataset.metadata['rucio']`. These are the runnable inputs.
0108 - **Unmatched request** — a catalog request with no Rucio dataset. The requested
0109 sample is not yet produced or registered, or it differs from what is
0110 registered (a different version or charge). This is expected during
0111 commissioning; it is the completeness signal, not an error.
0112 - **Unmatched Rucio** — a registered EVGEN dataset that no request claims.
0113 Either it is produced outside the catalogue, or the catalogue spells the
0114 request differently.
0115
0116 Both unmatched populations are discoverability targets for the catalog UI: an
0117 operator reconciles them by adding or correcting a request, or by registering
0118 the missing data.
0119
0120 ## Current state
0121
0122 Implemented: the assimilation sweep, the input matcher, and the catalog
0123 "Update EVGEN from Rucio" button (the production operations agent runs the sweep
0124 with apply and the page refreshes on completion). On the assimilated inventory
0125 the matcher resolves the DIS NC pythia8 samples (with the Q² fan-out above) and
0126 one beam-gas background; SIDIS and other classes fall to unmatched where the
0127 registered version, charge, or class differs from the request, as designed. Not
0128 yet implemented: the catalog UI surfacing of the matched and unmatched
0129 populations, and consuming a matched EVGEN dataset as a payload-staged
0130 submission input.
0131
0132 ## Related
0133
0134 - [JEDI_INTEGRATION.md](JEDI_INTEGRATION.md) — submission design; the single-Rucio constraint and the payload-staged input mode.
0135 - [EPICPROD_DATA_LINEAGE.md](EPICPROD_DATA_LINEAGE.md) — the produced-output sibling: gathering RECO/FULL Rucio references onto the catalog.
0136 - [EPICPROD_TASK_CATALOG.md](EPICPROD_TASK_CATALOG.md) — the production task catalog and its filters.
0137 - [PCS.md](PCS.md) — the configuration and dataset-identity model.