Back to home page

EIC code displayed by LXR

 
 

    


Warning, /swf-monitor/docs/EPICPROD_QUESTIONNAIRE.md is written in an unsupported language. File is not indexed.

0001 # ePIC Production Request Questionnaire
0002 
0003 Physics working groups and detector groups (PWG/DSC) request production datasets
0004 through a Google Form. The Physics Configuration System (PCS) mirrors each response into a
0005 Questionnaire record, giving the collaboration a browsable view of all requests
0006 and giving downstream production records a single upstream entry to reference for
0007 request provenance. This document describes the entity, its ingestion from the
0008 form, its relationship to the existing request and task records, and its access
0009 model.
0010 
0011 This should be read as proposal, not established design.
0012 
0013 The request and task records and the shared intake layer are described in
0014 [PCS_DATASET_REQUEST_WORKFLOW.md](PCS_DATASET_REQUEST_WORKFLOW.md). The external
0015 read/write contract referenced below is in [EXTERNAL_ACCESS.md](EXTERNAL_ACCESS.md).
0016 
0017 ## Not a google form replacement (at this point at least)
0018 
0019 Requesters continue to use the Google Form. PCS ingests it and does not replace it
0020 or require any new submission tool. Responsibility for the integrity of a
0021 submission stays with the form: a Questionnaire record reflects what the form
0022 holds and asserts no independent authority over the submitted content. 
0023 
0024 ## The Questionnaire entity
0025 
0026 A Questionnaire record is a read-only mirror of one form response. The mirrored
0027 fields are not edited in PCS. A `data` JSON metadata field, following the convention used by the other PCS models, is added and additional material is put there, e.g. annotation notes on the request. A status string field is also added for ops use. 
0028 
0029 The Questionnaire is distinct from the request record (`ProdRequest`). The
0030 Questionnaire is the request as submitted. `ProdRequest` is the triaged production
0031 record composed from PCS physics, generator, simulation, and reconstruction tags
0032 and a production configuration. The Questionnaire is not an extension of
0033 `ProdRequest`; the two are separate entities with separate authorities and
0034 audiences.
0035 
0036 ## Mirrored fields
0037 
0038 The form has seven response columns. Each maps to a Questionnaire field:
0039 
0040 | Field | Source question |
0041 |---|---|
0042 | `submitted_at` | Timestamp |
0043 | `description` | Name the dataset, generator, and purpose |
0044 | `repository` | Location of the repository with version control enforced per the input-processing guidelines (e.g. a tagged generator or steering-file repository) |
0045 | `contact` | Who is the contact person for the dataset generation |
0046 | `nevents` | How many events are requested |
0047 | `benchmark` | Time to simulate the first 100 events and disk space the output file occupies |
0048 | `estimate` | Total compute or storage required for the dataset, with justification for requests above 1% of the campaign budget (~120 core-years and ~35 TB per month) |
0049 
0050 The `description`, `repository`, and `contact` values are free text from the
0051 submitter. `nevents`, `benchmark`, and `estimate` are submitter-supplied and are
0052 not validated against production at ingestion.
0053 
0054 ## Ingestion
0055 
0056 The source is a CSV export of the form's responses sheet. The export URL is
0057 provided to the fetcher — as configuration or through the logged-in ingest
0058 control — and for now must be an accessible (link-readable) export; authenticated
0059 retrieval is deferred. A scheduled cron job fetches the export, compares it against the
0060 records already ingested, and creates a Questionnaire record for each new
0061 response. A logged-in operator can also trigger the fetch on demand rather than
0062 wait for the next scheduled run.
0063 
0064 Ingestion is idempotent on the submission timestamp. A content hash of the
0065 response detects an in-form edit, so a corrected response re-syncs the existing
0066 record in place. Records become active on ingestion; there is no separate review
0067 state, because submission integrity remains with the form.
0068 
0069 Ingestion is performed through a `questionnaire_intake` service function in
0070 `pcs.services`, exposed as peer REST and MCP (Model Context Protocol) operations,
0071 matching the existing PCS intake surface in
0072 [PCS_DATASET_REQUEST_WORKFLOW.md](PCS_DATASET_REQUEST_WORKFLOW.md). The CSV fetcher
0073 is one client of that service.
0074 
0075 ## Submission via PCS (deferred)
0076 
0077 A native no-login submission path in PCS is deferred. The form already accepts
0078 requests without a login, which is its purpose; reproducing that in PCS would add
0079 authentication-free write protection, server-side validation, and abuse control
0080 for a capability that already exists. Because ingestion goes through the
0081 `questionnaire_intake` service, a native form added later would call the same
0082 service and leave the ingestion model unchanged.
0083 
0084 ## Relationship to requests and tasks
0085 
0086 `ProdRequest` gains a nullable foreign key to the Questionnaire. One response can
0087 map to several requests, because a single submission frequently spans multiple
0088 beam energies or Q² ranges; a response that has not been triaged maps to none.
0089 
0090 A logged-in triage action links a response to a request
0091 (`questionnaire_link_request`, exposed as peer REST and MCP operations).
0092 Composing the request from PCS tags and configuration is a separate triage step on
0093 top of the link.
0094 
0095 Downstream provenance is by reference rather than by copy. `ProdTask` references
0096 `ProdRequest`, which references the Questionnaire, so a task resolves to its
0097 originating contact, benchmark, and estimate without duplicating those values. A
0098 specific field is denormalized onto `ProdRequest` when it becomes a query filter
0099 or dispatch key, not before.
0100 
0101 For questionnaire-to-production-task matches, the same boundary applies. When an
0102 operator establishes a match, PCS records the link and exposes navigation from
0103 the task UI back to the request, and code can follow that link to retrieve the
0104 request contact/email if needed for notifications or other workflow. The contact
0105 data itself remains on the request/questionnaire record and is not copied onto
0106 `ProdTask`, at least until matches are validated enough to treat them as
0107 authoritative task metadata.
0108 
0109 ## Access and contact handling
0110 
0111 The Questionnaire browser is readable by the collaboration without a login, on
0112 both the internal face and the external face served through the swf-remote proxy.
0113 On the internal face this uses the anonymous-allowed configuration already applied
0114 to the open PCS API paths; on the external face the browser page requires a
0115 swf-remote route entry, per the enumeration contract in
0116 [EXTERNAL_ACCESS.md](EXTERNAL_ACCESS.md). This design adds no public write path.
0117 
0118 Contact-person redaction is computed server-side, in REST serialization, keyed on
0119 the resolved reader identity (the `X-Remote-User` forwarded by swf-remote on the
0120 external face):
0121 
0122 - An authenticated reader receives the full contact value.
0123 - An unauthenticated reader receives initials.
0124 - When the contact value is an email address, an unauthenticated reader receives
0125   only the local part before `@`; the domain is removed and a full email address
0126   is never returned to an unauthenticated reader.
0127 
0128 Redaction is server-side rather than display-side so that neither the REST surface
0129 nor the external proxy emits a full name or email address to an unauthenticated
0130 caller. Response text originates from form submitters, is treated as untrusted
0131 input, and is sanitized when rendered.
0132 
0133 ## Implementation outline
0134 
0135 1. Questionnaire model: the mirrored fields above, a `data` JSON field holding
0136    annotation notes and other ops material, a `status` string field for ops use,
0137    the submission timestamp and content hash for idempotent ingestion, and
0138    creation and update timestamps.
0139 2. `questionnaire_intake` service function, idempotent on the timestamp and
0140    updating on a content-hash change, with peer REST and MCP operations.
0141 3. CSV fetcher: takes the responses-sheet export URL as provided input (config or
0142    the operator ingest control); runs on a cron schedule and from a logged-in
0143    operator trigger. For now the provided URL must be an accessible (link-readable)
0144    export; authenticated retrieval is deferred.
0145 4. Questionnaire browser and compose page in the two-pane PCS pattern, with
0146    server-side contact redaction and render-time sanitization, and the external
0147    route added to swf-remote.
0148 5. `ProdRequest.questionnaire` foreign key and the `questionnaire_link_request`
0149    triage action.
0150 6. Denormalization of selected fields onto `ProdRequest` as query needs arise.
0151    Data-flow and entity diagrams once the implementation exists.
0152 
0153 ## Related
0154 
0155 - [PCS_DATASET_REQUEST_WORKFLOW.md](PCS_DATASET_REQUEST_WORKFLOW.md) — the request and task records and the shared intake surface.
0156 - [PCS.md](PCS.md) — the configuration and campaign record.
0157 - [EXTERNAL_ACCESS.md](EXTERNAL_ACCESS.md) — the external proxy and its read/write contract.
0158 - [EPICPROD_TASK_CATALOG.md](EPICPROD_TASK_CATALOG.md), [EPICPROD_DATA_LINEAGE.md](EPICPROD_DATA_LINEAGE.md) — the downstream catalog and produced-data references.