Feature Calculation — Introduction¶

This section explains how features are calculated in the echo-performance server, detailing the Python implementation and the necessary database configuration.

Overview¶

Features (also called variables, tags, or points) are 10-minute (or other frequency) time series values computed for each asset in the Performance Server. The most important Airflow DAG for this is feature-calculator, which creates a CalculationHandler instance that:

Queries performance_db for the list of features and objects to calculate.
For each (object, feature) pair, looks up the server_calc_type attribute to determine which FeatureCalculator subclass to use.
Instantiates that calculator, fetches its requirements, runs the calculation, and saves results back to performance_db.

Architecture¶

Text Only

┌─────────────────────────────────────────────────────────────────────┐
│                         CalculationHandler                          │
│                                                                     │
│  1. Discover features to calculate (from performance_db)            │
│  2. For each (object, feature): instantiate FeatureCalculator       │
│  3. Run calculations (optionally parallel via ThreadPoolExecutor)   │
│  4. Save results to performance_db / Bazefield                      │
└──────────────────────┬──────────────────────────────────────────────┘
                       │  instantiates
                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│                 FeatureCalculator  (abstract base)                  │
│                                                                     │
│  __init__:  _add_requirement(...)  →  _fetch_requirements()         │
│  calculate: calls _compute(period) → save(...)                      │
│                                                                     │
│  _compute(period):  ← implemented in each concrete subclass         │
│    1. _fetch_requirements(period, ...)                              │
│    2. _requirement_data("RequiredFeatures")  → pl.DataFrame         │
│    3. Calculation logic                                             │
│    4. Return pl.DataFrame({"timestamp": ..., "feature": ...})       │
└──────────────────────┬──────────────────────────────────────────────┘
                       │  uses
                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│               CalculationRequirement  (abstract base)               │
│                                                                     │
│  Subclasses (one per data type):                                    │
│  ├── RequiredFeatures           → pl.DataFrame (time-series)        │
│  ├── RequiredObjectAttributes   → dict[object, dict[attr, value]]   │
│  ├── RequiredFeatureAttributes  → dict[feature, dict[attr, value]]  │
│  ├── RequiredCalcModels         → dict[object, dict[model, obj]]    │
│  ├── RequiredAlarms             → pl.DataFrame (alarm events)       │
│  ├── RequiredVibrationData      → pl.DataFrame (raw vibration)      │
│  └── RequiredVibrationFrequencies → pl.DataFrame (freq definitions) │
└─────────────────────────────────────────────────────────────────────┘

Registration System¶

All concrete FeatureCalculator subclasses are automatically registered when the module is imported, using Python's __init_subclass__ hook. The _name class attribute on each subclass becomes the key in FeatureCalculator._registry.

constants.py auto-imports every module whose name begins with feature_calc_, solar_energy_loss_, job_instance_, or alarm_calc_, triggering all __init_subclass__ calls and populating FEATURE_CALC_CLASS_MAPPING.

To add a new calculator: create a new module named feature_calc_<your_name>.py, define your class with _name = "<your_server_calc_type>", and it will be auto-discovered. No edits to constants.py are needed.

Data Flow for a Calculation¶

Text Only

CalculationHandler.calculate(period)
    │
    ├─ _enumerate_features()
    │     Queries performance_db for features matching the requested
    │     names/models. Filters to those with data_source_type="server_calc"
    │     and a known server_calc_type.
    │
    ├─ For each (object, feature):
    │     calculator = FEATURE_CALC_CLASS_MAPPING[server_calc_type](object, feature)
    │     │
    │     │  During __init__:
    │     │    _add_requirement(RequiredObjectAttributes(...))
    │     │    _add_requirement(RequiredFeatures(...))
    │     │    _fetch_requirements()   ← validates & fetches static data
    │     │
    │     result = calculator.calculate(period, save_into="all")
    │         │
    │         └─ _compute(period)
    │               _fetch_requirements(period)  ← fetches time-series data
    │               ... calculation logic ...
    │               return pl.DataFrame
    │
    └─ Results saved to performance_db (and optionally Bazefield)

DataFrame Conventions¶

All feature calculators work with Polars DataFrames:

Single-object results: columns are ["timestamp", "<feature_name>"]
Multi-object intermediate data: columns follow the "<object>@<feature>" flat encoding (e.g. "WT01@WindSpeed_10min.AVG")
Timestamps: pl.Datetime with millisecond precision

The helper module _polars_utils.py provides conversion between this flat Polars format and the pandas MultiIndex format that some older code still uses.

Next Steps¶

To understand how each class is implemented and how to configure the database, read the following sections in order:

CalculationRequirement — How to declare and fetch data dependencies
FeatureCalculator — How to implement a new calculator
CalculationHandler — How the orchestrator works
Developer Guide — Step-by-step walkthrough for adding new calculators, requirements, and jobs