Back to home page

EIC code displayed by LXR

 
 

    


Warning, /jana2/docs/concepts.md is written in an unsupported language. File is not indexed.

0001 # JANA2 Concepts
0002 
0003 
0004 ## Core Architecture
0005 
0006 ![JANA diagram](_media/jana-flow.svg)
0007 
0008 
0009 At its core, JANA2 views data processing as a chain of transformations, 
0010 where algorithms are applied to data to produce more refined data. 
0011 This process is organized into two main layers:
0012 
0013 1. **Queue-Arrow Mechanism:** JANA2 utilizes the [arrow model](https://en.wikipedia.org/wiki/Arrow_\(computer_science\)), 
0014    where data starts in a queue. An "arrow" pulls data from the queue, processes it with algorithms, 
0015    and places the processed data into another queue. The simplest setup involves input and output queues 
0016    with a single arrow handling all necessary algorithms. But JANA2 supports more complex configurations 
0017    with multiple queues and arrows chained together, operating sequentially or in parallel as needed.
0018 
0019    ![Queue-Arrow mechanism](_media/arrows-queue.svg)
0020 
0021 2. **Algorithm Management within Arrows:** Within each arrow, JANA2 organizes and manages algorithms along with their
0022   inputs and outputs, allowing flexibility in data processing. Arrows can be configured to distribute the processing
0023   load across various algorithms. By assigning threads to arrows, JANA2 leverages modern hardware to process data 
0024   concurrently across multiple cores and processors, enhancing scalability and efficiency.
0025 
0026 In organizing, managing, and building the codebase, JANA2 provides:
0027 
0028 - **Algorithm Building Blocks:** Essential components like Factories, Processors, Services and others, 
0029   help write, organize and manage algorithms. These modular units can be configured and combined to construct 
0030   the desired data processing pipelines, promoting flexibility and scalability.
0031 
0032 - **Plugin Mechanism:** Orthogonal to the above, JANA2 offers a plugin mechanism to enhance modularity and flexibility. 
0033   Plugins are dynamic libraries with a specialized interface, enabling them to register components with the main application.
0034   This allows for dynamic runtime configuration, selecting or replacing algorithms and components without recompilation,
0035   and better code organization and reuse. Large applications are typically built from multiple plugins, 
0036   each responsible for specific processing aspects. Alternatively, monolithic applications without plugins 
0037   can be created for simpler, smaller applications.
0038 
0039 
0040 ## Building blocks
0041 
0042 The data analysis application flow can be viewed as a chain of algorithms that transform input data into the 
0043 desired output. A simplified example of such a chain is shown in the diagram below:
0044 
0045 ![Simple Algorithms Flow](_media/algo_flow_01.svg)
0046 
0047 In this example, for each event, raw ADC values of hits are processed: 
0048 first combined into clusters, then passed into track-finding and fitting algorithms, 
0049 with the resulting tracks as the chain's output. In real-world scenarios, 
0050 the actual graph is significantly more complex and requires additional components such as Geometry, 
0051 magnetic field maps, calibrations, alignments, etc. 
0052 Additionally, some algorithms are responsible not only for processing objects in memory 
0053 but also for tasks such as reading data from disk or DAQ streams 
0054 and writing reconstructed data to a destination. 
0055 A more realistic and complex flow can be represented as follows:
0056 
0057 
0058 ![Simple Algorithms Flow](_media/algo_flow_02.svg)
0059 
0060 To give very brief overview algorithm building blocks, how this flow is organized in JANA2 : 
0061 
0062 - **JFactory** - This is the primary component for implementing algorithms (depicted as orange boxes). 
0063   JFactories compute specific results on an event-by-event basis. 
0064   Their inputs may come from an EventSource or other JFactories. 
0065   Algorithms in JFactories can be implemented using either Declarative or Imperative approaches 
0066   (described later in the documentation).
0067 
0068 - **JEventSource** - A special type of algorithm responsible for acquiring raw event data, 
0069   and exposes it to JANA for subsequent processing. For example reading events from a file or listening 
0070   to DAQ messaging producer which provides raw event data.  
0071 
0072 - **JEventProcessor** - Positioned at the top of the calculation chain, JEventProcessor is designed 
0073   to collect data from JFactories and handle end-point processing tasks, such as writing results to 
0074   an output file or messaging consumer. However, JEventProcessor is not limited to I/O operations; 
0075   it can also perform tasks like histogram plotting, data quality monitoring, and other forms of analysis.
0076 
0077   To clarify the distinction: JFactories form a lazy directed acyclic graph (DAG), 
0078   where each factory defines a specific step in the data processing chain. 
0079   In contrast, the JEventProcessor algorithm is executed for each event. 
0080   When the JEventProcessor collects data, it triggers the lazy evaluation of the required factories, 
0081   initiating the corresponding steps in the data processing chain.
0082 
0083 - **JService** - Used to store resources that remain constant across events, such as Geometry descriptions, 
0084  Magnetic Field Maps, and other shared data. Services are accessible by both algorithms and other services.
0085 
0086 
0087 We now may redraw the above diagram in terms of JANA2 building blocks:
0088 
0089 ![Simple Algorithms Flow](_media/algo_flow_03.svg)
0090 
0091 
0092 ## Data model
0093 
0094 JANA2 alows users to define and select their own event models,
0095 providing the flexibility to design data structures to specific experimental needs. Taking the above
0096 diagram as an example, classes such as `RawHits`, `HitClusters`, ... `Tracks` might be just a user defined classes.
0097 The data structures can be as simple as:
0098 
0099 ```cpp
0100 struct GenericHit {
0101 double x,y,z, edep;
0102 };
0103 ```
0104 
0105 A key feature of JANA2 is that it doesn't require data being passed around 
0106 to inherit from any specific base class, such as JObject (used in JANA1) or ROOT's TObject. 
0107 While your data classes can inherit from other classes if your data model requires it, 
0108 JANA2 remains agnostic about this. 
0109 
0110 JANA2 offers extended support for PODIO (Plain Old Data Input/Output) to facilitate standardized data handling,
0111 it does not mandate the use of PODIO or even ROOT. This ensures that users can choose the most suitable data management
0112 tools for their projects without being constrained by the framework.
0113 
0114 ### Data Identification in JANA2
0115 
0116 ![Simple Algorithms Flow](_media/data-identification.svg)
0117 
0118 An important aspect is how data is identified within JANA2. JANA2 supports two identifiers:
0119 
0120 1. **Data Type**: The C++ type of the data, e.g., `GenericHit` from the above example.
0121 2. **Tags**: A string identifier in addition to type. 
0122 
0123 The concept of tags is useful in several scenarios. For instance:
0124 - When multiple factories can produce the same type of data e.g. utilizing different underlying algorithms. 
0125   By specifying the tag name, you can select which algorithm's output you want.
0126 - To reuse the same type. E.g. You might have `GenericHit` data with tags 
0127   `"VertexTracker"` and `"BarrelTracker"` to distinguish between hits from different detectors. Or
0128   type `Particle` with tags `"TrueMcParticles"` and `"ReconstructedParticles"` 
0129 
0130 Depending on your data model and the types of factories used (described below), 
0131 you can choose different strategies for data identification:
0132 
0133 - **Type-Based Identification**: Fully identify data only by its type name, keeping the tag empty most of the time. 
0134   Use tags only to identify alternative algorithms. This approach is used by GlueX.
0135 - **Tag-Based Identification**: Use tags as the main data identifier and deduce types automatically whenever possible.
0136   This approach is used in PODIO data model and EIC reconstruction software.
0137 
0138 ## JApplication
0139 
0140 The [JApplication](https://jeffersonlab.github.io/JANA2/refcpp/class_j_application.html) 
0141 class is the central hub of the JANA2 framework, orchestrating all aspects of a JANA2-based 
0142 application. It manages the initialization, configuration, and execution of the data processing workflow, 
0143 serving as the entry point for interacting with the core components of the system. 
0144 By providing access to key managers, services, and runtime controls, 
0145 JApplication ensures that the application operates smoothly from setup to shutdown.
0146 To illustrate this, here is a code of typical standalone JANA2 application:
0147 
0148 ```cpp
0149 int main(int argc, char* argv[]) {
0150 
0151     auto params = new JParameterManager();
0152     // ...  usually some processing of argv here adding them to JParameterManager
0153 
0154     // Instantiate the JApplication with the parameter manager    
0155     JApplication app(params);
0156 
0157     // Add predefined plugoms
0158     app.AddPlugin("my_plugin");
0159     
0160     // Register services:
0161     app.ProvideService(std::make_shared<LogService>());
0162     app.ProvideService(std::make_shared<GeometryService>());
0163 
0164     // Register components
0165     app.Add(new JFactoryGeneratorT<MyFactoryA>);
0166     app.Add(new JFactoryGeneratorT<MyFactoryB>);
0167     app.Add(new JEventSourceGeneratorT<MyEventSource>);
0168     app.Add(new MyEventProcessor());
0169 
0170     // Initialize and run the application
0171     app.Initialize();
0172     app.Run();
0173 
0174     // Print the final performance report
0175     app.PrintFinalReport();
0176 
0177     // Retrieve and return the exit code
0178     return app.GetExitCode();
0179 }
0180 ```
0181 
0182 ## Factories
0183 
0184 We start with how the algorithms are implemented in JANA2, what is the data, 
0185 that flows between the algorithms and how those algorithms may be wired together.
0186 
0187 JANA implements a **factory model**, where data objects are the products, and the algorithms that generate them are the 
0188 factories. While there are various types of factories in JANA2 (covered later in this documentation), 
0189 they all follow the same fundamental concept:
0190 
0191 ![JANA2 Factory diagram](_media/concepts-factory-diagram.png)
0192 
0193 This diagram illustrates the analogy to industry. When a specific data object is requested for the current event in JANA, 
0194 the framework identifies the corresponding algorithm (factory) capable of producing it. 
0195 The framework then checks if the factory has already produced this data for the current event 
0196 (i.e., if the product is "in stock"). 
0197 
0198 - If the data **is already available**, it is retrieved and returned to the user.
0199 - **If not**, the factory is invoked to produce the required data, and the newly generated data is returned to the user.
0200 
0201 To create the requested data, factories may need lower-level objects, 
0202 triggering requests to the corresponding factories. It continues until all required factories have been 
0203 invoked and the entire chain of dependent objects has been produced.
0204 
0205 In other words, JANA2 factories form a lazily evaluated directed acyclic graph
0206 \([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)\) of data creation, 
0207 where all the produced data is cached until the entire event is finished processing.
0208 Thus factories produce its objects only once for a given event making it efficient when the 
0209 same data is required from multiple algorithms.
0210 
0211 
0212 ### Multithreading and factories
0213 
0214 In context of factories it is important to at least briefly mention how they work
0215 in terms of multithreading (much more details on it further)
0216 
0217 In JANA2, each thread has its own complete and independent set of factories capable of
0218 fully reconstructing an event within that thread. This minimizes the use of locks which would be required
0219 to coordinate between threads and subsequently degrade performance. Factory sets are maintained in a pool and
0220 are (optionally) assigned affinity to a specific NUMA group.
0221 
0222 ![JANA2 Factory diagram](_media/threading-schema.png)
0223 
0224 With some level of simplification, this diagram shows how sets of factories are created for each thread in the
0225 working pool. Limited by IO operations, events usually must be read in from the source sequentially(orange)
0226 and similarly written sequentially to the output(violet).
0227 
0228 ### Imperative vs Declarative factories
0229 
0230 How the simplest factory looks in terms of code? Probably the simplest would be JFactory<T>
0231 
0232 ```cpp
0233 // MyCluster - is what this factory outputs
0234 class ExampleFactory : public JFactoryT<MyCluster> {
0235 public:   
0236     void Init() override { /* ... initialize what is needed */ }    
0237     
0238     void Process(const std::shared_ptr<const JEvent> &event) override 
0239     {   
0240         auto hits = event->Get<MyHit>();   // Request data of type MyHit from JANA
0241         std::vector<MyCluster*> clusters;        
0242         for(auto hit: hits) {// ...        // Produce clusters from hits  
0243         Set(clusters);                     // Set the output data
0244     }
0245 };
0246 ```
0247 
0248 The above code gives a glimpse into how such an algorithm or factory might look. 
0249 In later sections, we will explore the methods, their details, and other components that can be utilized.
0250 
0251 What’s important to note in this example is that `JFactory<T>` follows the ***Imperative Approach***. 
0252 In this approach, the factory is provided with the `JEvent` interface, which it used to dynamically request
0253 the data required by the algorithm as needed. 
0254 
0255 JANA2 supports two distinct approaches for defining algorithms:
0256 
0257 - **Imperative Approach**: The algorithm determines dynamically what data it needs and requests 
0258   it through the JEvent interface.
0259 
0260 - **Declarative Approach**: The algorithm explicitly declares its required inputs and outputs upfront
0261   in the class definition.
0262 - 
0263 For instance, the declarative approach can be implemented using `JOmniFactory<T>`. 
0264 Here's how the same factory might look when following the declarative approach:
0265 
0266 ```cpp
0267 class ExampleFactory : public JOmniFactory<ExampleFactory> {
0268 public: 
0269 
0270     Input<MyHit> hits {this};              // Declare intputs
0271     Output<MyCluster> clusters {this};     // Declare what factory produces
0272     
0273     void Configure() override { /* ... same as Init() in JFactory */ }    
0274     
0275     void Execute(int32_t run_number, int32_t event_number) override 
0276     {   
0277         // It is ensured that all inputs are ready, when Execute is called. 
0278         for(auto hit: hits()) {// ...        // Produce clusters from hits  
0279         
0280         clusters() = std::move(clusters)     // Set the output data
0281     }
0282 };
0283 ```
0284 
0285 Declarative factories excel in terms of code management and clarity. 
0286 The declarative approach makes it immediately clear what an algorithm's inputs are and what it produces. 
0287 While this advantage may not be obvious in the above simple example, it becomes particularly evident when dealing 
0288 with complex algorithms that have numerous inputs, outputs, and configuration parameters. 
0289 For instance, consider a generic clustering algorithm that could later be adapted for various calorimeter detectors.
0290 
0291 In general, it is recommended to follow the declarative approach unless the dynamic flexibility 
0292 of imperative factories is explicitly required.
0293 
0294 As a good example scenario where the imperative approach is preferred is in software Level-3 (L3) triggers. 
0295 The imperative approach allows for highly efficient implementations of L3 (i.e., high-level) triggers. 
0296 A decision-making algorithm could be designed to request low-level objects first 
0297 to quickly determine whether to accept or reject an event. If the decision cannot be made using the low-level objects, 
0298 the algorithm can request higher-level objects for further evaluation. 
0299 This ability to dynamically activate factories on an event-by-event basis optimizes the L3 system’s throughput, 
0300 reducing the computational resources required to implement it.
0301 
0302 ### Factory types
0303 
0304 Main factory types in JANA2 are: 
0305 
0306 - `JFactory` - imperative factory with a single output type
0307 - `JMultifactory` - imperative factory that can produce several types at once
0308 - `JOmniFactory` - declarative factory with multiple outputs. 
0309 
0310 <table> 
0311 <tr>
0312 <th></th>
0313 <th>Declarative</th>
0314 <th colspan="2">Imperative</th>
0315 </tr>
0316 <tr>
0317 <th></th>
0318 <th>JOmniFactory</th>
0319 <th>JFactory</th>
0320 <th>JMultifactory</th>
0321 </tr>
0322 
0323 <tr>
0324 <td>Inputs</td>
0325 <td>Fixed number of input types</td>
0326 <td colspan="2">Any number of input types</td>
0327 </tr>
0328 
0329 <tr>
0330 <td>Input requests</td>
0331 <td>Declared upfront in class definition</td>
0332 <td colspan="2">Requested dynamically through JEvent interface</td>
0333 </tr>
0334 
0335 <tr>
0336 <td>Outputs</td>
0337 <td>Multiple types/outputs</td>
0338 <td>Single type</td>
0339 <td>Multiple types</td>
0340 </tr>
0341 
0342 <tr>
0343 <td>Outputs declaration</td>
0344 <td>Declared upfront in class definition</td>
0345 <td>Declared in class definition</td>
0346 <td>Must be declared in constructor</td>
0347 </tr>
0348 
0349 </table>
0350 
0351 
0352 ### Declarative Factories
0353 
0354 ```cpp
0355 
0356 /// A factory should be inherited from JOmniFactory<T> 
0357 /// where T should be the factory class itself (CRTP)  
0358 struct HitRecoFactory : public JOmniFactory<HitRecoFactory> {
0359 
0360    /// "Output-s" is what data produced.
0361    Output<HitCluster> m_clusters{this};
0362    
0363    /// "Input-s" is the data that factory uses to produce result 
0364    Input<McHit> m_mcHits{this};
0365    
0366    /// Additional service needed to produce data
0367    Service<CalibrationService> m_calibration{this};
0368    
0369    /// Parameters are values, that can be changed from command line
0370    Parameter<bool> m_cfg_use_true_pos{this, "hits:min_edep_cut", 100, "Flag description"};
0371 
0372    /// Configure is called once, to configure the algorithm
0373    void Configure() {  /* ... */ }
0374 
0375    /// Called when processing run number is changed 
0376    void ChangeRun(int32_t run_number) { /* ... get calibrations for run ... */ }
0377 
0378    /// Called for each event
0379    void Execute(int32_t /*run_nr*/, uint64_t event_index) 
0380    {
0381       auto result = std::vector<HitCluster*>();  
0382       for(auto hit: m_mcHits()) {   // get input data from event source or other factories
0383          // ... produce clusters from hits
0384       }
0385       
0386       // 
0387       m_clusters() = std::move(result);
0388    }
0389 
0390 ``` 
0391 
0392 ### Factory generators
0393 
0394 Since every working thread creates its set of factory, besides factories code one has to provide a way 
0395 how to create a factory. I.e. provide a factory generator class. Fortunately, JANA2 provides a templated
0396 generic FactoryGeneratorT code that work for the majority of cases:
0397 
0398 ```cpp
0399 // For JFactories
0400 
0401 // For JOmniFactories
0402 ```
0403 
0404 
0405 
0406 ## Plugins
0407 
0408 In JANA2, plugins are dynamic libraries that extend the functionality of the main application by registering 
0409 additional components such as event sources, factories, event processors, and services. 
0410 Plugins are a powerful mechanism that allows developers to modularize their code, promote code reuse, 
0411 and configure applications dynamically at runtime without the need for recompilation.
0412 
0413 For a library to be recognized as a plugin, it must implement a specific initialization function called 
0414 `InitPlugin()` with C linkage. The function is called by JANA when plugins are loaded and should be used 
0415 for registering the plugin's components with the JApplication instance.
0416 
0417 ```cpp
0418 extern "C" {
0419     void InitPlugin(JApplication* app) {
0420         InitJANAPlugin(app);
0421         // Register components: 
0422         app->Add(/** ... */);    // add components from this plugin 
0423         app->Add(/** ... */); 
0424         // ...    
0425     }
0426 }
0427 ```
0428 
0429 ### How Plugins Are Found and Loaded
0430 
0431 When a JANA2 application starts, it searches for plugins in specific directories. 
0432 The framework maintains a list of plugin search paths where it looks for plugin libraries. 
0433 By default, this includes directories such as:
0434 
0435 - The current working directory.
0436 - Directories specified by the `JANA_PLUGIN_PATH` environment variable.
0437 - Directories added programmatically via the `AddPluginPath()` method of `JApplication`.
0438 
0439 Plugins are loaded in two main ways:
0440 
0441 - **Automatic Loading**: The application can be configured to load plugins specified by 
0442   command-line arguments or configuration parameters via `-Pplugins` flag.
0443 
0444   ```bash
0445   ./my_jana_application -Pplugins=MyPlugin1,AnotherPlugin
0446   ```
0447 
0448 - **Programmatic Loading**: Plugins can be loaded explicitly in the application code 
0449   by calling the `AddPlugin()` method of `JApplication`.
0450 
0451 ### Plugins debugging
0452 
0453 JANA2 provides a very handy parameter `jana:debug_plugin_loading=1` which will print 
0454 the detailed information on the process of plugin loading. 
0455 
0456 
0457 ## Object lifecycles
0458 
0459 It is important to understand who owns each JObject and when it is destroyed.
0460 
0461 By default, a JFactory owns all of the JObjects that it created during `Process()`. Once all event processors have 
0462 finished processing a `JEvent`, all `JFactories` associated with that `JEvent` will clears and delete their `JObjects`. 
0463 However, you can change this behavior by setting one of the factory flags:
0464 
0465 * `PERSISTENT`: Objects are neither cleared nor deleted. This is usually used for calibrations and translation tables.
0466  Note that if an object is persistent, `JFactory::Process` will _not_ be re-run on the next `JEvent`. The user  
0467  may still update the objects manually, via `JFactory::BeginRun`, and must delete the objects manually via 
0468  `JFactory::EndRun` or `JFactory::Finish`. 
0469  
0470 * `NOT_OBJECT_OWNER`: Objects are cleared from the `JFactory` but _not_ deleted. This is useful for "proxy" factories 
0471  (which reorganize objects that are owned by a different factory) and for `JEventGroups`. `JFactory::Process` _will_ be
0472  re-run for each `JEvent`. As long as the objects are owned by a different `JFactory`, the user doesn't have to do any 
0473  cleanup.
0474  
0475 The lifetime of a `JFactory` spans the time that a `JEvent` is in-flight. No other guarantees are made: `JFactories` might
0476 be re-used for multiple `JEvents` for the sake of efficiency, but the implementation is free to _not_ do so. In particular,
0477 the user must never assume that one `JFactory` will see the entire `JEvent` stream.
0478 
0479 The lifetime of a `JEventSource` spans the time that all of its emitted `JEvents` are in-flight. 
0480 
0481 The lifetime of a `JEventProcessor` spans the time that any `JEventSources` are active.
0482 
0483 The lifetime of a `JService` not only spans the time that any `JEventProcessors` are active, but also the lifetime of 
0484 `JApplication` itself. Furthermore, because JServices use `shared_ptr`, they are allowed to live even longer than 
0485 `JApplication`, which is helpful for things like writing test cases.
0486 
0487 
0488 
0489