Calculation Requirement¶
Overview¶
The CalculationRequirement class is an abstract base class that defines the interface for all calculation requirements. It is responsible for validating that specific data exists in the database and fetching that data on demand.
Each FeatureCalculator subclass declares its requirements in __init__ by calling _add_requirement(...), then triggers validation and fetching via _fetch_requirements(...).
Lifecycle¶
CalculationRequirement.__init__(optional)
│ Sets up DB connections (_perfdb, _baze)
└─ Ready
check()
│ 1. Already-checked guard: returns True immediately if already checked
│ 2. Cache lookup: if _check_cache_key() is non-None, shares results
│ across threads/instances with the same key (eliminates duplicate queries)
│ 3. _do_check(): runs the actual validation (or early-returns if optional)
│ 4. Sets self._checked = True
└─ Returns True, or raises ValueError if requirement not met
get_data(**kwargs)
│ Fetches the actual data into self._data
└─ Returns the data
Caching Mechanism¶
CalculationRequirement includes a thread-safe class-level cache (_cache, _cache_lock) that is initialized per subclass via __init_subclass__. This means:
- If two
FeatureCalculatorinstances for different objects both needRequiredObjectAttributesfor the same object, the second lookup comes from the cache instead of hitting the DB again. - The cache is keyed by
_check_cache_key(). Override this in subclasses that fetch static (period-independent) data.
When to enable caching: Override _check_cache_key() for static data like object attributes, feature attributes, and trained models. Do not cache time-series data (features, alarms) because those depend on the calculation period.
Usage¶
The calculation requirements will be used throughout feature calculations. As a general rule:
- Instantiate the requirement with the necessary arguments.
- Register it with
FeatureCalculator._add_requirement(req). - Call
FeatureCalculator._fetch_requirements(...), which callscheck()thenget_data()on each requirement. - Access the data via
FeatureCalculator._requirement_data("RequirementClassName").
Subclass Implementation¶
Subclasses must implement:
__init__: callsuper().__init__(optional=optional), store instance-specific parameters (object names, feature names, etc.)._do_check: validate that the required data exists (raiseValueErrorif not). Do not setself._checked = Truehere;check()does that.get_data: fetch and setself._data, then return it.__repr__: return a descriptive string for debugging.
Optionally override:
_check_cache_key: return a hashable key to enable class-level caching for static data. ReturnNone(default) to disable._get_cache_value: customize what value is stored in the cache (defaults toself._data)._set_from_cache: customize how cached values are restored toself._data(default iscopy.deepcopy).
Optional requirements¶
When optional=True, the requirement should not raise errors if data is absent. The pattern differs by subclass:
- Most classes: add
if self.optional: returnat the top of_do_check(). RequiredFeatureAttributesandRequiredObjectAttributes: still fetch data when optional but tolerate missing items (if not found and not self.optional: raise).
Minimal subclass example¶
from __future__ import annotations
from typing import Any
from .calculation_requirements_core import CalculationRequirement
class RequiredMyData(CalculationRequirement):
"""Fetches my custom data from performance_db."""
def __init__(self, object_name: str, optional: bool = False) -> None:
super().__init__(optional=optional)
self._object_name = object_name
def _check_cache_key(self) -> tuple:
# Enable caching keyed by object name (only if data is static)
return (type(self).__name__, self._object_name)
def _do_check(self) -> None:
if self.optional:
return
if not self._perfdb.my_data_exists(self._object_name):
raise ValueError(f"Data for '{self._object_name}' does not exist.")
def get_data(self, **kwargs) -> Any:
if not self._checked:
self.check()
self._data = self._perfdb.fetch_my_data(self._object_name)
return self._data
def __repr__(self) -> str:
return f"RequiredMyData(object={self._object_name!r}, optional={self.optional})"
Available Requirement Classes¶
| Class | Data returned | Cacheable |
|---|---|---|
RequiredFeatures |
pl.DataFrame with "timestamp" + "object@feature" columns |
No (period-dependent) |
RequiredObjectAttributes |
dict[object_name, dict[attr, value]] |
Yes |
RequiredFeatureAttributes |
dict[feature_name, dict[attr, value]] |
Yes |
RequiredCalcModels |
dict[object_name, dict[model_name, {model, ...}]] |
Yes |
RequiredAlarms |
pl.DataFrame with alarm event rows |
No (period-dependent) |
RequiredVibrationData |
pl.DataFrame with raw vibration records |
No (period-dependent) |
RequiredVibrationFrequencies |
pl.DataFrame with frequency definitions |
Yes |
See the individual pages for each class for full details.
Class Definition¶
CalculationRequirement(optional=False)
¶
Abstract base class for all data requirements used by feature calculators.
A CalculationRequirement encapsulates a single source of input data
(e.g. object attributes, feature time-series, trained models) and provides
two responsibilities:
- Validation (:meth:
check): confirm the required data exists/is accessible before the calculation period is known. - Fetching (:meth:
get_data): retrieve the actual data and store it in :attr:data.
Subclass contract
- Override
__init__to accept source-specific arguments and callsuper().__init__(optional=optional). - Implement :meth:
_do_check: raiseValueErrorif the requirement is unmet (unlessself.optional). - Implement :meth:
get_data: fetch and store data inself._data. - Implement :meth:
__repr__. - Optionally override :meth:
_check_cache_keyto enable class-level caching for static (period-independent) data.
Thread-safe caching
Each concrete subclass automatically gets its own threading.local
instance via __init_subclass__. When :meth:_check_cache_key returns
a non-None key, the result of _do_check is stored in a
per-thread cache dict and reused by subsequent instances in the same
thread that produce the same key. Because the cache is never shared across
threads, no lock is required and Polars operations inside _do_check
cannot deadlock regardless of POLARS_MAX_THREADS.
Optional requirements
When optional=True, the requirement should not raise errors if data is
absent. The typical pattern in :meth:_do_check is::
if self.optional:
return # skip all validation
Some subclasses (e.g. RequiredFeatureAttributes) still fetch when
optional but tolerate missing items — they use::
if not found and not self.optional:
raise ValueError(...)
The subclasses of CalculationRequirement will get all the necessary data for a calculation, checking if they exists in the database and in some cases, also if they are valid.
In subclasses this constructor should be called with super().init(optional=optional).
Parameters:
-
(optional¶bool, default:False) –Defines if the requirement is optional. If optional is True, the requirement is only validated to check if it could exist, not if it is actually present. By default False
Source code in echo_energycalc/calculation_requirements_core.py
def __init__(self, optional: bool = False) -> None:
"""
Constructor of the CalculationRequirement class.
The subclasses of CalculationRequirement will get all the necessary data for a calculation, checking if they exists in the database and in some cases, also if they are valid.
In subclasses this constructor should be called with super().__init__(optional=optional).
Parameters
----------
optional : bool, optional
Defines if the requirement is optional.
If optional is True, the requirement is only validated to check if it could exist, not if it is actually present.
By default False
"""
self._perfdb: PerfDB = PerfDB(application_name=self.__class__.__name__)
"""Stores the connection to performance database"""
self._optional: bool = optional
"""Defines if the requirement is optional"""
self._checked: bool = False
"""Defines if the requirement has been checked"""
self._fetched: bool = False
"""Defines if get_data() has been called on this requirement"""
self._data: Any | None = None
"""Stores the data required for the calculation"""
checked
property
¶
Attribute that defines if the requirement has been checked. It's value will start as False and will be set to True after the check method is called.
Returns:
-
bool–True if the requirement has been checked.
data
property
¶
Attribute used to store the data required for the calculation.
Initially it is None and will be set with the data acquired by the get_data method. The data type will depend on the subclass implementation, but usually it will be a polars DataFrame or a dictionary.
Returns:
-
Any | None–Returns the data required for the calculation.
fetched
property
¶
Attribute that defines if get_data() has been called on this requirement.
True even when the fetch returned no data (e.g. an optional requirement
that found nothing). Use this to distinguish "never fetched" from "fetched
but empty/None".
Returns:
-
bool–True if get_data() has been called at least once.
optional
property
¶
Attribute that defines if the requirement is optional.
If optional is True, the requirement is only validated to check if it could exist, not if it is actually present. This is useful for requirements that are not necessary for all calculations, but are useful for some of them.
Returns:
-
bool–True if the requirement is optional.
check()
¶
Check that the requirement is met.
This concrete implementation handles two concerns automatically so that
subclasses only need to implement _do_check():
- Already-checked guard — returns
Trueimmediately ifcheck()has already succeeded for this instance, avoiding redundant DB round-trips when_fetch_requirements()iterates requirements on every_compute()call. - Per-thread caching — when
_check_cache_key()returns a non-None key, the result produced by_do_check()is stored in a thread-local cache and reused by subsequent instances in the same thread with the same key. Because the cache is never shared across threads, no locking is needed and concurrent Polars operations inside_do_checkcannot deadlock.
The optional guard is intentionally delegated to _do_check() because
different subclasses have different optional semantics (see _do_check docs).
Returns:
-
bool–True if the requirement is met; raises on unmet non-optional requirements.
Source code in echo_energycalc/calculation_requirements_core.py
def check(self) -> bool:
"""
Check that the requirement is met.
This concrete implementation handles two concerns automatically so that
subclasses only need to implement ``_do_check()``:
1. **Already-checked guard** — returns ``True`` immediately if ``check()`` has
already succeeded for this instance, avoiding redundant DB round-trips when
``_fetch_requirements()`` iterates requirements on every ``_compute()`` call.
2. **Per-thread caching** — when ``_check_cache_key()`` returns a non-None key,
the result produced by ``_do_check()`` is stored in a thread-local cache and
reused by subsequent instances in the same thread with the same key. Because
the cache is never shared across threads, no locking is needed and concurrent
Polars operations inside ``_do_check`` cannot deadlock.
The **optional guard** is intentionally delegated to ``_do_check()`` because
different subclasses have different optional semantics (see ``_do_check`` docs).
Returns
-------
bool
True if the requirement is met; raises on unmet non-optional requirements.
"""
if self._checked:
return True
cache_key = self._check_cache_key()
if cache_key is not None:
_tl = type(self)._cache_local # noqa: SLF001
if not hasattr(_tl, "cache"):
_tl.cache = {}
cached = _tl.cache.get(cache_key)
if cached is None:
self._do_check()
_tl.cache[cache_key] = self._get_cache_value()
cached = _tl.cache[cache_key]
else:
logger.debug("Cache hit for %s (key=%s)", type(self).__name__, cache_key)
self._set_from_cache(cached)
else:
self._do_check()
self._checked = True
return True
get_data(**kwargs)
abstractmethod
¶
Method used to get the data required for the calculation.
The method should first check if the requirement has been checked. If not, it should check before getting the data.
At the end of the method, the attribute self._data should be set with the data queried from performance_db or any other source.
Returns:
-
Any–Returns the data required for the calculation.
Source code in echo_energycalc/calculation_requirements_core.py
@abstractmethod
def get_data(self, **kwargs) -> Any:
"""
Method used to get the data required for the calculation.
The method should first check if the requirement has been checked. If not, it should check before getting the data.
At the end of the method, the attribute self._data should be set with the data queried from performance_db or any other source.
Returns
-------
Any
Returns the data required for the calculation.
"""
raise NotImplementedError("This method must be implemented by a subclass")