jana2/docs/concepts.md

0001 # JANA2 Concepts
0002
0003
0004 ## Core Architecture
0005
0006 ![JANA diagram](_media/jana-flow.svg)
0007
0008
0009 At its core, JANA2 views data processing as a chain of transformations,
0010 where algorithms are applied to data to produce more refined data.
0011 This process is organized into two main layers:
0012
0013 1. **Queue-Arrow Mechanism:** JANA2 utilizes the [arrow model](https://en.wikipedia.org/wiki/Arrow_\(computer_science\)),
0014    where data starts in a queue. An "arrow" pulls data from the queue, processes it with algorithms,
0015    and places the processed data into another queue. The simplest setup involves input and output queues
0016    with a single arrow handling all necessary algorithms. But JANA2 supports more complex configurations
0017    with multiple queues and arrows chained together, operating sequentially or in parallel as needed.
0018
0019    ![Queue-Arrow mechanism](_media/arrows-queue.svg)
0020
0021 2. **Algorithm Management within Arrows:** Within each arrow, JANA2 organizes and manages algorithms along with their
0022   inputs and outputs, allowing flexibility in data processing. Arrows can be configured to distribute the processing
0023   load across various algorithms. By assigning threads to arrows, JANA2 leverages modern hardware to process data
0024   concurrently across multiple cores and processors, enhancing scalability and efficiency.
0025
0026 In organizing, managing, and building the codebase, JANA2 provides:
0027
0028 - **Algorithm Building Blocks:** Essential components like Factories, Processors, Services and others,
0029   help write, organize and manage algorithms. These modular units can be configured and combined to construct
0030   the desired data processing pipelines, promoting flexibility and scalability.
0031
0032 - **Plugin Mechanism:** Orthogonal to the above, JANA2 offers a plugin mechanism to enhance modularity and flexibility.
0033   Plugins are dynamic libraries with a specialized interface, enabling them to register components with the main application.
0034   This allows for dynamic runtime configuration, selecting or replacing algorithms and components without recompilation,
0035   and better code organization and reuse. Large applications are typically built from multiple plugins,
0036   each responsible for specific processing aspects. Alternatively, monolithic applications without plugins
0037   can be created for simpler, smaller applications.
0038
0039
0040 ## Building blocks
0041
0042 The data analysis application flow can be viewed as a chain of algorithms that transform input data into the
0043 desired output. A simplified example of such a chain is shown in the diagram below:
0044
0045 ![Simple Algorithms Flow](_media/algo_flow_01.svg)
0046
0047 In this example, for each event, raw ADC values of hits are processed:
0048 first combined into clusters, then passed into track-finding and fitting algorithms,
0049 with the resulting tracks as the chain's output. In real-world scenarios,
0050 the actual graph is significantly more complex and requires additional components such as Geometry,
0051 magnetic field maps, calibrations, alignments, etc.
0052 Additionally, some algorithms are responsible not only for processing objects in memory
0053 but also for tasks such as reading data from disk or DAQ streams
0054 and writing reconstructed data to a destination.
0055 A more realistic and complex flow can be represented as follows:
0056
0057
0058 ![Simple Algorithms Flow](_media/algo_flow_02.svg)
0059
0060 To give very brief overview algorithm building blocks, how this flow is organized in JANA2 :
0061
0062 - **JFactory** - This is the primary component for implementing algorithms (depicted as orange boxes).
0063   JFactories compute specific results on an event-by-event basis.
0064   Their inputs may come from an EventSource or other JFactories.
0065   Algorithms in JFactories can be implemented using either Declarative or Imperative approaches
0066   (described later in the documentation).
0067
0068 - **JEventSource** - A special type of algorithm responsible for acquiring raw event data,
0069   and exposes it to JANA for subsequent processing. For example reading events from a file or listening
0070   to DAQ messaging producer which provides raw event data.
0071
0072 - **JEventProcessor** - Positioned at the top of the calculation chain, JEventProcessor is designed
0073   to collect data from JFactories and handle end-point processing tasks, such as writing results to
0074   an output file or messaging consumer. However, JEventProcessor is not limited to I/O operations;
0075   it can also perform tasks like histogram plotting, data quality monitoring, and other forms of analysis.
0076
0077   To clarify the distinction: JFactories form a lazy directed acyclic graph (DAG),
0078   where each factory defines a specific step in the data processing chain.
0079   In contrast, the JEventProcessor algorithm is executed for each event.
0080   When the JEventProcessor collects data, it triggers the lazy evaluation of the required factories,
0081   initiating the corresponding steps in the data processing chain.
0082
0083 - **JService** - Used to store resources that remain constant across events, such as Geometry descriptions,
0084  Magnetic Field Maps, and other shared data. Services are accessible by both algorithms and other services.
0085
0086
0087 We now may redraw the above diagram in terms of JANA2 building blocks:
0088
0089 ![Simple Algorithms Flow](_media/algo_flow_03.svg)
0090
0091
0092 ## Data model
0093
0094 JANA2 alows users to define and select their own event models,
0095 providing the flexibility to design data structures to specific experimental needs. Taking the above
0096 diagram as an example, classes such as `RawHits`, `HitClusters`, ... `Tracks` might be just a user defined classes.
0097 The data structures can be as simple as:
0098
0099 ```cpp
0100 struct GenericHit {
0101 double x,y,z, edep;
0102 };
0103 ```
0104
0105 A key feature of JANA2 is that it doesn't require data being passed around
0106 to inherit from any specific base class, such as JObject (used in JANA1) or ROOT's TObject.
0107 While your data classes can inherit from other classes if your data model requires it,
0108 JANA2 remains agnostic about this.
0109
0110 JANA2 offers extended support for PODIO (Plain Old Data Input/Output) to facilitate standardized data handling,
0111 it does not mandate the use of PODIO or even ROOT. This ensures that users can choose the most suitable data management
0112 tools for their projects without being constrained by the framework.
0113
0114 ### Data Identification in JANA2
0115
0116 ![Simple Algorithms Flow](_media/data-identification.svg)
0117
0118 An important aspect is how data is identified within JANA2. JANA2 supports two identifiers:
0119
0120 1. **Data Type**: The C++ type of the data, e.g., `GenericHit` from the above example.
0121 2. **Tags**: A string identifier in addition to type.
0122
0123 The concept of tags is useful in several scenarios. For instance:
0124 - When multiple factories can produce the same type of data e.g. utilizing different underlying algorithms.
0125   By specifying the tag name, you can select which algorithm's output you want.
0126 - To reuse the same type. E.g. You might have `GenericHit` data with tags
0127   `"VertexTracker"` and `"BarrelTracker"` to distinguish between hits from different detectors. Or
0128   type `Particle` with tags `"TrueMcParticles"` and `"ReconstructedParticles"`
0129
0130 Depending on your data model and the types of factories used (described below),
0131 you can choose different strategies for data identification:
0132
0133 - **Type-Based Identification**: Fully identify data only by its type name, keeping the tag empty most of the time.
0134   Use tags only to identify alternative algorithms. This approach is used by GlueX.
0135 - **Tag-Based Identification**: Use tags as the main data identifier and deduce types automatically whenever possible.
0136   This approach is used in PODIO data model and EIC reconstruction software.
0137
0138 ## JApplication
0139
0140 The [JApplication](https://jeffersonlab.github.io/JANA2/refcpp/class_j_application.html)
0141 class is the central hub of the JANA2 framework, orchestrating all aspects of a JANA2-based
0142 application. It manages the initialization, configuration, and execution of the data processing workflow,
0143 serving as the entry point for interacting with the core components of the system.
0144 By providing access to key managers, services, and runtime controls,
0145 JApplication ensures that the application operates smoothly from setup to shutdown.
0146 To illustrate this, here is a code of typical standalone JANA2 application:
0147
0148 ```cpp
0149 int main(int argc, char* argv[]) {
0150
0151     auto params = new JParameterManager();
0152     // ...  usually some processing of argv here adding them to JParameterManager
0153
0154     // Instantiate the JApplication with the parameter manager
0155     JApplication app(params);
0156
0157     // Add predefined plugoms
0158     app.AddPlugin("my_plugin");
0159
0160     // Register services:
0161     app.ProvideService(std::make_shared<LogService>());
0162     app.ProvideService(std::make_shared<GeometryService>());
0163
0164     // Register components
0165     app.Add(new JFactoryGeneratorT<MyFactoryA>);
0166     app.Add(new JFactoryGeneratorT<MyFactoryB>);
0167     app.Add(new JEventSourceGeneratorT<MyEventSource>);
0168     app.Add(new MyEventProcessor());
0169
0170     // Initialize and run the application
0171     app.Initialize();
0172     app.Run();
0173
0174     // Print the final performance report
0175     app.PrintFinalReport();
0176
0177     // Retrieve and return the exit code
0178     return app.GetExitCode();
0179 }
0180 ```
0181
0182 ## Factories
0183
0184 We start with how the algorithms are implemented in JANA2, what is the data,
0185 that flows between the algorithms and how those algorithms may be wired together.
0186
0187 JANA implements a **factory model**, where data objects are the products, and the algorithms that generate them are the
0188 factories. While there are various types of factories in JANA2 (covered later in this documentation),
0189 they all follow the same fundamental concept:
0190
0191 ![JANA2 Factory diagram](_media/concepts-factory-diagram.png)
0192
0193 This diagram illustrates the analogy to industry. When a specific data object is requested for the current event in JANA,
0194 the framework identifies the corresponding algorithm (factory) capable of producing it.
0195 The framework then checks if the factory has already produced this data for the current event
0196 (i.e., if the product is "in stock").
0197
0198 - If the data **is already available**, it is retrieved and returned to the user.
0199 - **If not**, the factory is invoked to produce the required data, and the newly generated data is returned to the user.
0200
0201 To create the requested data, factories may need lower-level objects,
0202 triggering requests to the corresponding factories. It continues until all required factories have been
0203 invoked and the entire chain of dependent objects has been produced.
0204
0205 In other words, JANA2 factories form a lazily evaluated directed acyclic graph
0206 \([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)\) of data creation,
0207 where all the produced data is cached until the entire event is finished processing.
0208 Thus factories produce its objects only once for a given event making it efficient when the
0209 same data is required from multiple algorithms.
0210
0211
0212 ### Multithreading and factories
0213
0214 In context of factories it is important to at least briefly mention how they work
0215 in terms of multithreading (much more details on it further)
0216
0217 In JANA2, each thread has its own complete and independent set of factories capable of
0218 fully reconstructing an event within that thread. This minimizes the use of locks which would be required
0219 to coordinate between threads and subsequently degrade performance. Factory sets are maintained in a pool and
0220 are (optionally) assigned affinity to a specific NUMA group.
0221
0222 ![JANA2 Factory diagram](_media/threading-schema.png)
0223
0224 With some level of simplification, this diagram shows how sets of factories are created for each thread in the
0225 working pool. Limited by IO operations, events usually must be read in from the source sequentially(orange)
0226 and similarly written sequentially to the output(violet).
0227
0228 ### Imperative vs Declarative factories
0229
0230 How the simplest factory looks in terms of code? Probably the simplest would be JFactory<T>
0231
0232 ```cpp
0233 // MyCluster - is what this factory outputs
0234 class ExampleFactory : public JFactoryT<MyCluster> {
0235 public:
0236     void Init() override { /* ... initialize what is needed */ }
0237
0238     void Process(const std::shared_ptr<const JEvent> &event) override
0239     {
0240         auto hits = event->Get<MyHit>();   // Request data of type MyHit from JANA
0241         std::vector<MyCluster*> clusters;
0242         for(auto hit: hits) {// ...        // Produce clusters from hits
0243         Set(clusters);                     // Set the output data
0244     }
0245 };
0246 ```
0247
0248 The above code gives a glimpse into how such an algorithm or factory might look.
0249 In later sections, we will explore the methods, their details, and other components that can be utilized.
0250
0251 What’s important to note in this example is that `JFactory<T>` follows the ***Imperative Approach***.
0252 In this approach, the factory is provided with the `JEvent` interface, which it used to dynamically request
0253 the data required by the algorithm as needed.
0254
0255 JANA2 supports two distinct approaches for defining algorithms:
0256
0257 - **Imperative Approach**: The algorithm determines dynamically what data it needs and requests
0258   it through the JEvent interface.
0259
0260 - **Declarative Approach**: The algorithm explicitly declares its required inputs and outputs upfront
0261   in the class definition.
0262 -
0263 For instance, the declarative approach can be implemented using `JOmniFactory<T>`.
0264 Here's how the same factory might look when following the declarative approach:
0265
0266 ```cpp
0267 class ExampleFactory : public JOmniFactory<ExampleFactory> {
0268 public:
0269
0270     Input<MyHit> hits {this};              // Declare intputs
0271     Output<MyCluster> clusters {this};     // Declare what factory produces
0272
0273     void Configure() override { /* ... same as Init() in JFactory */ }
0274
0275     void Execute(int32_t run_number, int32_t event_number) override
0276     {
0277         // It is ensured that all inputs are ready, when Execute is called.
0278         for(auto hit: hits()) {// ...        // Produce clusters from hits
0279
0280         clusters() = std::move(clusters)     // Set the output data
0281     }
0282 };
0283 ```
0284
0285 Declarative factories excel in terms of code management and clarity.
0286 The declarative approach makes it immediately clear what an algorithm's inputs are and what it produces.
0287 While this advantage may not be obvious in the above simple example, it becomes particularly evident when dealing
0288 with complex algorithms that have numerous inputs, outputs, and configuration parameters.
0289 For instance, consider a generic clustering algorithm that could later be adapted for various calorimeter detectors.
0290
0291 In general, it is recommended to follow the declarative approach unless the dynamic flexibility
0292 of imperative factories is explicitly required.
0293
0294 As a good example scenario where the imperative approach is preferred is in software Level-3 (L3) triggers.
0295 The imperative approach allows for highly efficient implementations of L3 (i.e., high-level) triggers.
0296 A decision-making algorithm could be designed to request low-level objects first
0297 to quickly determine whether to accept or reject an event. If the decision cannot be made using the low-level objects,
0298 the algorithm can request higher-level objects for further evaluation.
0299 This ability to dynamically activate factories on an event-by-event basis optimizes the L3 system’s throughput,
0300 reducing the computational resources required to implement it.
0301
0302 ### Factory types
0303
0304 Main factory types in JANA2 are:
0305
0306 - `JFactory` - imperative factory with a single output type
0307 - `JMultifactory` - imperative factory that can produce several types at once
0308 - `JOmniFactory` - declarative factory with multiple outputs.
0309
0310 <table>
0311 <tr>
0312 <th></th>
0313 <th>Declarative</th>
0314 <th colspan="2">Imperative</th>
0315 </tr>
0316 <tr>
0317 <th></th>
0318 <th>JOmniFactory</th>
0319 <th>JFactory</th>
0320 <th>JMultifactory</th>
0321 </tr>
0322
0323 <tr>
0324 <td>Inputs</td>
0325 <td>Fixed number of input types</td>
0326 <td colspan="2">Any number of input types</td>
0327 </tr>
0328
0329 <tr>
0330 <td>Input requests</td>
0331 <td>Declared upfront in class definition</td>
0332 <td colspan="2">Requested dynamically through JEvent interface</td>
0333 </tr>
0334
0335 <tr>
0336 <td>Outputs</td>
0337 <td>Multiple types/outputs</td>
0338 <td>Single type</td>
0339 <td>Multiple types</td>
0340 </tr>
0341
0342 <tr>
0343 <td>Outputs declaration</td>
0344 <td>Declared upfront in class definition</td>
0345 <td>Declared in class definition</td>
0346 <td>Must be declared in constructor</td>
0347 </tr>
0348
0349 </table>
0350
0351
0352 ### Declarative Factories
0353
0354 ```cpp
0355
0356 /// A factory should be inherited from JOmniFactory<T>
0357 /// where T should be the factory class itself (CRTP)
0358 struct HitRecoFactory : public JOmniFactory<HitRecoFactory> {
0359
0360    /// "Output-s" is what data produced.
0361    Output<HitCluster> m_clusters{this};
0362
0363    /// "Input-s" is the data that factory uses to produce result
0364    Input<McHit> m_mcHits{this};
0365
0366    /// Additional service needed to produce data
0367    Service<CalibrationService> m_calibration{this};
0368
0369    /// Parameters are values, that can be changed from command line
0370    Parameter<bool> m_cfg_use_true_pos{this, "hits:min_edep_cut", 100, "Flag description"};
0371
0372    /// Configure is called once, to configure the algorithm
0373    void Configure() {  /* ... */ }
0374
0375    /// Called when processing run number is changed
0376    void ChangeRun(int32_t run_number) { /* ... get calibrations for run ... */ }
0377
0378    /// Called for each event
0379    void Execute(int32_t /*run_nr*/, uint64_t event_index)
0380    {
0381       auto result = std::vector<HitCluster*>();
0382       for(auto hit: m_mcHits()) {   // get input data from event source or other factories
0383          // ... produce clusters from hits
0384       }
0385
0386       //
0387       m_clusters() = std::move(result);
0388    }
0389
0390 ```
0391
0392 ### Factory generators
0393
0394 Since every working thread creates its set of factory, besides factories code one has to provide a way
0395 how to create a factory. I.e. provide a factory generator class. Fortunately, JANA2 provides a templated
0396 generic FactoryGeneratorT code that work for the majority of cases:
0397
0398 ```cpp
0399 // For JFactories
0400
0401 // For JOmniFactories
0402 ```
0403
0404
0405
0406 ## Plugins
0407
0408 In JANA2, plugins are dynamic libraries that extend the functionality of the main application by registering
0409 additional components such as event sources, factories, event processors, and services.
0410 Plugins are a powerful mechanism that allows developers to modularize their code, promote code reuse,
0411 and configure applications dynamically at runtime without the need for recompilation.
0412
0413 For a library to be recognized as a plugin, it must implement a specific initialization function called
0414 `InitPlugin()` with C linkage. The function is called by JANA when plugins are loaded and should be used
0415 for registering the plugin's components with the JApplication instance.
0416
0417 ```cpp
0418 extern "C" {
0419     void InitPlugin(JApplication* app) {
0420         InitJANAPlugin(app);
0421         // Register components:
0422         app->Add(/** ... */);    // add components from this plugin
0423         app->Add(/** ... */);
0424         // ...
0425     }
0426 }
0427 ```
0428
0429 ### How Plugins Are Found and Loaded
0430
0431 When a JANA2 application starts, it searches for plugins in specific directories.
0432 The framework maintains a list of plugin search paths where it looks for plugin libraries.
0433 By default, this includes directories such as:
0434
0435 - The current working directory.
0436 - Directories specified by the `JANA_PLUGIN_PATH` environment variable.
0437 - Directories added programmatically via the `AddPluginPath()` method of `JApplication`.
0438
0439 Plugins are loaded in two main ways:
0440
0441 - **Automatic Loading**: The application can be configured to load plugins specified by
0442   command-line arguments or configuration parameters via `-Pplugins` flag.
0443
0444   ```bash
0445   ./my_jana_application -Pplugins=MyPlugin1,AnotherPlugin
0446   ```
0447
0448 - **Programmatic Loading**: Plugins can be loaded explicitly in the application code
0449   by calling the `AddPlugin()` method of `JApplication`.
0450
0451 ### Plugins debugging
0452
0453 JANA2 provides a very handy parameter `jana:debug_plugin_loading=1` which will print
0454 the detailed information on the process of plugin loading.
0455
0456
0457 ## Object lifecycles
0458
0459 It is important to understand who owns each JObject and when it is destroyed.
0460
0461 By default, a JFactory owns all of the JObjects that it created during `Process()`. Once all event processors have
0462 finished processing a `JEvent`, all `JFactories` associated with that `JEvent` will clears and delete their `JObjects`.
0463 However, you can change this behavior by setting one of the factory flags:
0464
0465 * `PERSISTENT`: Objects are neither cleared nor deleted. This is usually used for calibrations and translation tables.
0466  Note that if an object is persistent, `JFactory::Process` will _not_ be re-run on the next `JEvent`. The user
0467  may still update the objects manually, via `JFactory::BeginRun`, and must delete the objects manually via
0468  `JFactory::EndRun` or `JFactory::Finish`.
0469
0470 * `NOT_OBJECT_OWNER`: Objects are cleared from the `JFactory` but _not_ deleted. This is useful for "proxy" factories
0471  (which reorganize objects that are owned by a different factory) and for `JEventGroups`. `JFactory::Process` _will_ be
0472  re-run for each `JEvent`. As long as the objects are owned by a different `JFactory`, the user doesn't have to do any
0473  cleanup.
0474
0475 The lifetime of a `JFactory` spans the time that a `JEvent` is in-flight. No other guarantees are made: `JFactories` might
0476 be re-used for multiple `JEvents` for the sake of efficiency, but the implementation is free to _not_ do so. In particular,
0477 the user must never assume that one `JFactory` will see the entire `JEvent` stream.
0478
0479 The lifetime of a `JEventSource` spans the time that all of its emitted `JEvents` are in-flight.
0480
0481 The lifetime of a `JEventProcessor` spans the time that any `JEventSources` are active.
0482
0483 The lifetime of a `JService` not only spans the time that any `JEventProcessors` are active, but also the lifetime of
0484 `JApplication` itself. Furthermore, because JServices use `shared_ptr`, they are allowed to live even longer than
0485 `JApplication`, which is helpful for things like writing test cases.
0486
0487
0488
0489