Skip to content

Calculation Requirement

Overview

The CalculationRequirement class is an abstract base class that defines the interface for all calculation requirements. It is responsible for validating that specific data exists in the database and fetching that data on demand.

Each FeatureCalculator subclass declares its requirements in __init__ by calling _add_requirement(...), then triggers validation and fetching via _fetch_requirements(...).


Lifecycle

Text Only
CalculationRequirement.__init__(optional)
  │  Sets up DB connections (_perfdb, _baze)
  └─ Ready

check()
  │  1. Already-checked guard: returns True immediately if already checked
  │  2. Cache lookup: if _check_cache_key() is non-None, shares results
  │     across threads/instances with the same key (eliminates duplicate queries)
  │  3. _do_check(): runs the actual validation (or early-returns if optional)
  │  4. Sets self._checked = True
  └─ Returns True, or raises ValueError if requirement not met

get_data(**kwargs)
  │  Fetches the actual data into self._data
  └─ Returns the data

Caching Mechanism

CalculationRequirement includes a thread-safe class-level cache (_cache, _cache_lock) that is initialized per subclass via __init_subclass__. This means:

  • If two FeatureCalculator instances for different objects both need RequiredObjectAttributes for the same object, the second lookup comes from the cache instead of hitting the DB again.
  • The cache is keyed by _check_cache_key(). Override this in subclasses that fetch static (period-independent) data.

When to enable caching: Override _check_cache_key() for static data like object attributes, feature attributes, and trained models. Do not cache time-series data (features, alarms) because those depend on the calculation period.


Usage

The calculation requirements will be used throughout feature calculations. As a general rule:

  1. Instantiate the requirement with the necessary arguments.
  2. Register it with FeatureCalculator._add_requirement(req).
  3. Call FeatureCalculator._fetch_requirements(...), which calls check() then get_data() on each requirement.
  4. Access the data via FeatureCalculator._requirement_data("RequirementClassName").

Subclass Implementation

Subclasses must implement:

  • __init__: call super().__init__(optional=optional), store instance-specific parameters (object names, feature names, etc.).
  • _do_check: validate that the required data exists (raise ValueError if not). Do not set self._checked = True here; check() does that.
  • get_data: fetch and set self._data, then return it.
  • __repr__: return a descriptive string for debugging.

Optionally override:

  • _check_cache_key: return a hashable key to enable class-level caching for static data. Return None (default) to disable.
  • _get_cache_value: customize what value is stored in the cache (defaults to self._data).
  • _set_from_cache: customize how cached values are restored to self._data (default is copy.deepcopy).

Optional requirements

When optional=True, the requirement should not raise errors if data is absent. The pattern differs by subclass:

  • Most classes: add if self.optional: return at the top of _do_check().
  • RequiredFeatureAttributes and RequiredObjectAttributes: still fetch data when optional but tolerate missing items (if not found and not self.optional: raise).

Minimal subclass example

Python
from __future__ import annotations
from typing import Any
from .calculation_requirements_core import CalculationRequirement


class RequiredMyData(CalculationRequirement):
    """Fetches my custom data from performance_db."""

    def __init__(self, object_name: str, optional: bool = False) -> None:
        super().__init__(optional=optional)
        self._object_name = object_name

    def _check_cache_key(self) -> tuple:
        # Enable caching keyed by object name (only if data is static)
        return (type(self).__name__, self._object_name)

    def _do_check(self) -> None:
        if self.optional:
            return
        if not self._perfdb.my_data_exists(self._object_name):
            raise ValueError(f"Data for '{self._object_name}' does not exist.")

    def get_data(self, **kwargs) -> Any:
        if not self._checked:
            self.check()
        self._data = self._perfdb.fetch_my_data(self._object_name)
        return self._data

    def __repr__(self) -> str:
        return f"RequiredMyData(object={self._object_name!r}, optional={self.optional})"

Available Requirement Classes

Class Data returned Cacheable
RequiredFeatures pl.DataFrame with "timestamp" + "object@feature" columns No (period-dependent)
RequiredObjectAttributes dict[object_name, dict[attr, value]] Yes
RequiredFeatureAttributes dict[feature_name, dict[attr, value]] Yes
RequiredCalcModels dict[object_name, dict[model_name, {model, ...}]] Yes
RequiredAlarms pl.DataFrame with alarm event rows No (period-dependent)
RequiredVibrationData pl.DataFrame with raw vibration records No (period-dependent)
RequiredVibrationFrequencies pl.DataFrame with frequency definitions Yes

See the individual pages for each class for full details.


Class Definition

CalculationRequirement(optional=False)

Abstract base class for all data requirements used by feature calculators.

A CalculationRequirement encapsulates a single source of input data (e.g. object attributes, feature time-series, trained models) and provides two responsibilities:

  1. Validation (:meth:check): confirm the required data exists/is accessible before the calculation period is known.
  2. Fetching (:meth:get_data): retrieve the actual data and store it in :attr:data.
Subclass contract
  • Override __init__ to accept source-specific arguments and call super().__init__(optional=optional).
  • Implement :meth:_do_check: raise ValueError if the requirement is unmet (unless self.optional).
  • Implement :meth:get_data: fetch and store data in self._data.
  • Implement :meth:__repr__.
  • Optionally override :meth:_check_cache_key to enable class-level caching for static (period-independent) data.
Thread-safe caching

Each concrete subclass automatically gets its own threading.local instance via __init_subclass__. When :meth:_check_cache_key returns a non-None key, the result of _do_check is stored in a per-thread cache dict and reused by subsequent instances in the same thread that produce the same key. Because the cache is never shared across threads, no lock is required and Polars operations inside _do_check cannot deadlock regardless of POLARS_MAX_THREADS.

Optional requirements

When optional=True, the requirement should not raise errors if data is absent. The typical pattern in :meth:_do_check is::

Text Only
if self.optional:
    return   # skip all validation

Some subclasses (e.g. RequiredFeatureAttributes) still fetch when optional but tolerate missing items — they use::

Text Only
if not found and not self.optional:
    raise ValueError(...)

The subclasses of CalculationRequirement will get all the necessary data for a calculation, checking if they exists in the database and in some cases, also if they are valid.

In subclasses this constructor should be called with super().init(optional=optional).

Parameters:

  • optional

    (bool, default: False ) –

    Defines if the requirement is optional. If optional is True, the requirement is only validated to check if it could exist, not if it is actually present. By default False

Source code in echo_energycalc/calculation_requirements_core.py
Python
def __init__(self, optional: bool = False) -> None:
    """
    Constructor of the CalculationRequirement class.

    The subclasses of CalculationRequirement will get all the necessary data for a calculation, checking if they exists in the database and in some cases, also if they are valid.

    In subclasses this constructor should be called with super().__init__(optional=optional).

    Parameters
    ----------
    optional : bool, optional
        Defines if the requirement is optional.
        If optional is True, the requirement is only validated to check if it could exist, not if it is actually present.
        By default False
    """
    self._perfdb: PerfDB = PerfDB(application_name=self.__class__.__name__)
    """Stores the connection to performance database"""

    self._optional: bool = optional
    """Defines if the requirement is optional"""

    self._checked: bool = False
    """Defines if the requirement has been checked"""

    self._fetched: bool = False
    """Defines if get_data() has been called on this requirement"""

    self._data: Any | None = None
    """Stores the data required for the calculation"""

checked property

Attribute that defines if the requirement has been checked. It's value will start as False and will be set to True after the check method is called.

Returns:

  • bool

    True if the requirement has been checked.

data property

Attribute used to store the data required for the calculation.

Initially it is None and will be set with the data acquired by the get_data method. The data type will depend on the subclass implementation, but usually it will be a polars DataFrame or a dictionary.

Returns:

  • Any | None

    Returns the data required for the calculation.

fetched property

Attribute that defines if get_data() has been called on this requirement.

True even when the fetch returned no data (e.g. an optional requirement that found nothing). Use this to distinguish "never fetched" from "fetched but empty/None".

Returns:

  • bool

    True if get_data() has been called at least once.

optional property

Attribute that defines if the requirement is optional.

If optional is True, the requirement is only validated to check if it could exist, not if it is actually present. This is useful for requirements that are not necessary for all calculations, but are useful for some of them.

Returns:

  • bool

    True if the requirement is optional.

check()

Check that the requirement is met.

This concrete implementation handles two concerns automatically so that subclasses only need to implement _do_check():

  1. Already-checked guard — returns True immediately if check() has already succeeded for this instance, avoiding redundant DB round-trips when _fetch_requirements() iterates requirements on every _compute() call.
  2. Per-thread caching — when _check_cache_key() returns a non-None key, the result produced by _do_check() is stored in a thread-local cache and reused by subsequent instances in the same thread with the same key. Because the cache is never shared across threads, no locking is needed and concurrent Polars operations inside _do_check cannot deadlock.

The optional guard is intentionally delegated to _do_check() because different subclasses have different optional semantics (see _do_check docs).

Returns:

  • bool

    True if the requirement is met; raises on unmet non-optional requirements.

Source code in echo_energycalc/calculation_requirements_core.py
Python
def check(self) -> bool:
    """
    Check that the requirement is met.

    This concrete implementation handles two concerns automatically so that
    subclasses only need to implement ``_do_check()``:

    1. **Already-checked guard** — returns ``True`` immediately if ``check()`` has
       already succeeded for this instance, avoiding redundant DB round-trips when
       ``_fetch_requirements()`` iterates requirements on every ``_compute()`` call.
    2. **Per-thread caching** — when ``_check_cache_key()`` returns a non-None key,
       the result produced by ``_do_check()`` is stored in a thread-local cache and
       reused by subsequent instances in the same thread with the same key. Because
       the cache is never shared across threads, no locking is needed and concurrent
       Polars operations inside ``_do_check`` cannot deadlock.

    The **optional guard** is intentionally delegated to ``_do_check()`` because
    different subclasses have different optional semantics (see ``_do_check`` docs).

    Returns
    -------
    bool
        True if the requirement is met; raises on unmet non-optional requirements.
    """
    if self._checked:
        return True

    cache_key = self._check_cache_key()

    if cache_key is not None:
        _tl = type(self)._cache_local  # noqa: SLF001
        if not hasattr(_tl, "cache"):
            _tl.cache = {}
        cached = _tl.cache.get(cache_key)
        if cached is None:
            self._do_check()
            _tl.cache[cache_key] = self._get_cache_value()
            cached = _tl.cache[cache_key]
        else:
            logger.debug("Cache hit for %s (key=%s)", type(self).__name__, cache_key)
        self._set_from_cache(cached)
    else:
        self._do_check()

    self._checked = True
    return True

get_data(**kwargs) abstractmethod

Method used to get the data required for the calculation.

The method should first check if the requirement has been checked. If not, it should check before getting the data.

At the end of the method, the attribute self._data should be set with the data queried from performance_db or any other source.

Returns:

  • Any

    Returns the data required for the calculation.

Source code in echo_energycalc/calculation_requirements_core.py
Python
@abstractmethod
def get_data(self, **kwargs) -> Any:
    """
    Method used to get the data required for the calculation.

    The method should first check if the requirement has been checked. If not, it should check before getting the data.

    At the end of the method, the attribute self._data should be set with the data queried from performance_db or any other source.

    Returns
    -------
    Any
        Returns the data required for the calculation.
    """
    raise NotImplementedError("This method must be implemented by a subclass")