Skip to content

Expression Evaluation

Overview

FeatureCalcEvalExpression evaluates a Python code string stored in the database to calculate a feature. This is useful for one-off, moderately complex calculations that don't justify a dedicated class — the logic lives in the database alongside the feature definition, not in Python source code.


Engine Selection

The first line of the expression controls which DataFrame type is exposed as df:

First line df type Column naming
# df_type=polars polars.DataFrame Bare feature names ("ActivePower_10min.AVG")
# df_type=polars_multiple_objs polars.DataFrame "object@feature" flat columns
# df_type=pandas pandas.DataFrame Bare feature names, DatetimeIndex
# df_type=pandas_multiple_objs pandas.DataFrame MultiIndex (object, feature), DatetimeIndex
(omitted or other comment) pandas.DataFrame Bare feature names (legacy default)

Choose the Polars engines for better performance. Use multiple_objs variants when the expression needs features from more than one object simultaneously.


Calculation Logic

The expression code stored in feature_eval_expression must:

  1. Use df as the variable holding the input data (see engine table above).
  2. At the end of the expression, df must contain a column named exactly self.feature (the feature being calculated) — this column is extracted as the result.
  3. Be valid Python code that can be executed in a sandboxed namespace.

Available namespaces

The following names are always available inside an expression:

  • pd — pandas
  • np — numpy
  • interp1dscipy.interpolate.interp1d
  • filtersecho_energycalc.filters module
  • DataFrame, Series, Timestamp — from pandas
  • pl / polars — Polars (only in polars* engines)
  • Standard builtins: abs, all, any, bool, dict, enumerate, float, int, isinstance, len, list, map, max, min, print, range, round, set, slice, sorted, str, sum, tuple, type, zip, None, True, False

Warning

Dangerous builtins (__import__, open, exec, eval, compile, getattr) are blocked in the sandbox.


Feature Discovery

The calculator uses get_features_in_expression from echo-postgres to automatically identify which features appear in df[...] access patterns in the expression. These features are then requested from the database for the calculation period.

Note

Columns whose name contains temporary_feature are excluded from auto-discovery and will not be fetched from the database. Use these for intermediate calculation columns.


Bazefield Features

Append _b# to any feature name to fetch it from Bazefield instead of performance_db:

Python
df["ActivePower_5min.AVG_b#"]  # fetches ActivePower_5min.AVG from Bazefield

These columns are auto-aligned to the nearest 10-minute timestamp (within ±2 minutes).


Object Attributes in Expressions

Enclose object attribute names in $${}$$ to substitute their values at evaluation time:

Python
df["curtailment_loss"] = df["lost_power"] * $$efficiency_factor$$

replace_object_attributes_in_expression from echo-postgres replaces $$efficiency_factor$$ with the actual numeric value from the database before the expression is evaluated.


Examples

Simple expression (pandas, default engine)

Python
df["reactive_energy"] = df["reactive_energy_delivered"] - df["reactive_energy_received"]

Using object attributes

Python
df["curtailment_state"] = 0.0
df.loc[df["active_power"] < $$nominal_power$$ * 0.99, "curtailment_state"] = 1.0

Polars engine

Python
# df_type=polars
df = df.with_columns(
    (pl.col("reactive_energy_delivered") - pl.col("reactive_energy_received"))
    .alias("reactive_energy")
)

Multiple objects with Polars

Python
# df_type=polars_multiple_objs
df = df.with_columns(
    (pl.col("WT01@ActivePower_10min.AVG") + pl.col("WT02@ActivePower_10min.AVG"))
    .alias("total_power")
)

Database Requirements

  • Feature attribute server_calc_type must be set to expression_evaluation.
  • Feature attribute feature_eval_expression must contain the Python expression code.

Tips

Tip

When saving expressions to the database, double quotes must be escaped (\") and line breaks replaced with \n. The VSCode extension Replace Rules can automate this with the following configuration in settings.json:

JSON
"replacerules.rules": {
    "Change new line to \\n": { "find": "\n", "replace": "\\n" },
    "Change \\n to new line": { "find": "\\\\n", "replace": "\n" },
    "Change \"to  \\\"": { "find": "\"", "replace": "\\\"" },
    "Change \\\" to \"": { "find": "\\\\\"", "replace": "\"" }
},
"replacerules.rulesets": {
    "Python to String": { "rules": ["Change \"to  \\\"", "Change new line to \\n"] },
    "String to Python": { "rules": ["Change \\n to new line", "Change \\\" to \""] }
}

Class Definition

FeatureCalcEvalExpression(object_name, feature)

FeatureCalculator that evaluates a Python expression string stored in performance_db.

The expression is read from the feature_eval_expression attribute of the feature. It is executed in a sandboxed namespace (see :data:_SAFE_BUILTINS and :data:_EXPRESSION_GLOBALS) and must produce a column named after the target feature in the variable df.

Engine selection

The first line of the expression string selects the DataFrame type exposed as df:

========================================== ===================================== ================================== First line df type Column naming ========================================== ===================================== ================================== # df_type=polars polars.DataFrame Bare feature names + "timestamp" # df_type=polars_multiple_objs polars.DataFrame "object@feature" flat names # df_type=pandas pandas.DataFrame (DatetimeIndex) Bare feature names # df_type=pandas_multiple_objs pandas.DataFrame (DatetimeIndex) MultiIndex (object, feature) (omitted or unrecognised comment) pandas.DataFrame (DatetimeIndex) Bare feature names (legacy default) ========================================== ===================================== ==================================

Feature discovery

Features referenced inside df["..."] patterns are automatically discovered by get_features_in_expression from echo-postgres and requested for the calculation period. Columns whose name contains "temporary_feature" are excluded from auto-discovery (useful for intermediate calculation variables).

Object attribute substitution

Attribute placeholders in the form $$attr_name$$ are replaced with their actual values before evaluation via replace_object_attributes_in_expression from echo-postgres.

Bazefield features

Appending _b# to any feature name fetches it from Bazefield instead of performance_db (e.g. df["ActivePower_5min.AVG_b#"]). These columns are excluded from echo-postgres auto-discovery and are aligned to the nearest 10-minute timestamp within ±2 minutes.

These expressions must be saved in the feature_eval_expression attribute of the feature in performance_db.

Here we assumed that the expressions use df as the DataFrame with the needed features.

The features used in df will be requested for the desired period by the method. The method get_features_in_expression from echo_postgres will be used to define which are the features in the expression. Features that contain temporary_feature in the name will be ignored and can be used as auxiliary columns in the DataFrame for helping in the calculations.

As well, object attributes will be replaced in the expression by the method replace_object_attributes_in_expression from echo_postgres. The attributes must be enclosed in $${}$$ like $$hub_height$$.

At the end of the expression, df must have a column with the name of the feature that is being calculated and this column will be returned as the result of the calculation.

Parameters:

  • object_name

    (str) –

    Name of the object for which the feature is calculated. It must exist in performance_db.

  • feature

    (str) –

    Feature of the object that is calculated. It must exist in performance_db.

Source code in echo_energycalc/feature_calc_eval_expression.py
Python
def __init__(
    self,
    object_name: str,
    feature: str,
) -> None:
    """
    FeatureCalculator class for features that rely on evaluation of expressions saved as strings.

    These expressions must be saved in the `feature_eval_expression` attribute of the feature in performance_db.

    Here we assumed that the expressions use `df` as the DataFrame with the needed features.

    The features used in `df` will be requested for the desired period by the method. The method `get_features_in_expression` from echo_postgres will be used to define which are the features in the expression. Features that contain `temporary_feature` in the name will be ignored and can be used as auxiliary columns in the DataFrame for helping in the calculations.

    As well, object attributes will be replaced in the expression by the method `replace_object_attributes_in_expression` from echo_postgres. The attributes must be enclosed in `$${}$$` like `$$hub_height$$`.

    At the end of the expression, `df` must have a column with the name of the feature that is being calculated and this column will be returned as the result of the calculation.

    Parameters
    ----------
    object_name : str
        Name of the object for which the feature is calculated. It must exist in performance_db.
    feature : str
        Feature of the object that is calculated. It must exist in performance_db.
    """
    # initialize parent class
    super().__init__(object_name, feature)

    # requirements for the feature calculator
    self._add_requirement(RequiredFeatureAttributes(self.object, self.feature, ["feature_eval_expression"]))
    self._fetch_requirements()

    # Pre-resolve expression metadata here (main thread) so that _compute
    # — which runs inside a ThreadPoolExecutor worker — never calls echo-postgres
    # directly.  Both functions below hit echo-postgres internally and deadlock
    # under concurrent access even with separate PerfDB instances.
    _eval_expression = self._requirement_data("RequiredFeatureAttributes")[self.feature]["feature_eval_expression"]
    self._required_features: list[str] = get_features_in_expression(_eval_expression, self.object, self._perfdb)
    self._eval_expression_resolved: str = replace_object_attributes_in_expression(
        _eval_expression, self.object, self._perfdb
    )
    self._round_timestamps: dict | None = (
        {"freq": timedelta(minutes=5), "tolerance": timedelta(minutes=2)}
        if any(feat.endswith("_b#") for feat in self._required_features)
        else None
    )

    # Register the RequiredFeatures dependency now that we know which features the
    # expression references.  _fetch_requirements(period=...) in _compute will
    # populate the data for the correct period without re-querying the metadata.
    self._add_requirement(RequiredFeatures(features={self.object: self._required_features}))

feature property

Feature that is calculated. This will be defined in the constructor and cannot be changed.

Returns:

  • str

    Name of the feature that is calculated.

name property

Name of the feature calculator. Is defined in child classes of FeatureCalculator.

This must be equal to the "server_calc_type" attribute of the feature in performance_db.

Returns:

  • str

    Name of the feature calculator.

object property

Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.

Returns:

  • str

    Object name for which the feature is calculated.

requirements property

List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.

Returns:

  • dict[str, list[CalculationRequirement]]

    Dict of requirements.

    The keys are the names of the classes of the requirements and the values are lists of requirements of that class.

    For example: {"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}

result property

Result of the calculation. This is None until the method "calculate" is called.

Returns:

  • DataFrame | None

    Polars DataFrame with a "timestamp" column and one or more feature value columns. None until calculate is called.

calculate(period, save_into=None, cached_data=None, **kwargs)

Run the calculation for the given period and optionally save the result.

Calls :meth:_compute to get the result, stores it in :attr:result, then calls :meth:save. Subclasses should implement :meth:_compute instead of overriding this method.

Parameters:

  • period

    (DateTimeRange) –

    Period for which the feature will be calculated.

  • save_into

    (Literal['all', 'performance_db'] | None, default: None ) –
    • "all": save in performance_db and bazefield.
    • "performance_db": save only in performance_db.
    • None: do not save.

    By default None.

  • cached_data

    (DataFrame | None, default: None ) –

    Polars DataFrame with features already fetched/calculated. Passed to _compute to enable chained calculations without re-querying performance_db. By default None.

  • **kwargs

    Forwarded to :meth:save.

Returns:

  • DataFrame

    Polars DataFrame with a "timestamp" column and one or more feature value columns.

Source code in echo_energycalc/feature_calc_core.py
Python
def calculate(
    self,
    period: DateTimeRange,
    save_into: Literal["all", "performance_db"] | None = None,
    cached_data: pl.DataFrame | None = None,
    **kwargs,
) -> pl.DataFrame:
    """
    Run the calculation for the given period and optionally save the result.

    Calls :meth:`_compute` to get the result, stores it in :attr:`result`,
    then calls :meth:`save`. Subclasses should implement :meth:`_compute` instead
    of overriding this method.

    Parameters
    ----------
    period : DateTimeRange
        Period for which the feature will be calculated.
    save_into : Literal["all", "performance_db"] | None, optional
        - ``"all"``: save in performance_db and bazefield.
        - ``"performance_db"``: save only in performance_db.
        - ``None``: do not save.

        By default None.
    cached_data : pl.DataFrame | None, optional
        Polars DataFrame with features already fetched/calculated. Passed to
        ``_compute`` to enable chained calculations without re-querying
        performance_db. By default None.
    **kwargs
        Forwarded to :meth:`save`.

    Returns
    -------
    pl.DataFrame
        Polars DataFrame with a ``"timestamp"`` column and one or more feature value columns.
    """
    result = self._compute(period, cached_data=cached_data)
    self._result = result
    self.save(save_into=save_into, **kwargs)
    return result

save(save_into=None, **kwargs)

Method to save the calculated feature values in performance_db.

Parameters:

  • save_into

    (Literal['all', 'performance_db'] | None, default: None ) –

    Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

    By default None.

  • **kwargs

    (dict, default: {} ) –

    Not being used at the moment. Here only for compatibility.

Source code in echo_energycalc/feature_calc_core.py
Python
def save(
    self,
    save_into: Literal["all", "performance_db"] | None = None,
    **kwargs,  # noqa: ARG002
) -> None:
    """
    Method to save the calculated feature values in performance_db.

    Parameters
    ----------
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    **kwargs : dict, optional
        Not being used at the moment. Here only for compatibility.
    """
    # checking arguments
    if not isinstance(save_into, str | type(None)):
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
    if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
        raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")

    # checking if calculation was done
    if self.result is None:
        raise ValueError(
            "The calculation was not done. Please call 'calculate' before calling 'save'.",
        )

    if save_into is None:
        return

    upload_to_bazefield = save_into == "all"

    if not isinstance(self.result, pl.DataFrame):
        raise TypeError(f"result must be a polars DataFrame, not {type(self.result)}.")
    if "timestamp" not in self.result.columns:
        raise ValueError("result DataFrame must contain a 'timestamp' column.")

    # rename feature columns to "object@feature" format expected by perfdb polars insert
    feat_cols = [c for c in self.result.columns if c != "timestamp"]
    result_pl = self.result.rename({col: f"{self.object}@{col}" for col in feat_cols})

    self._perfdb.features.values.series.insert(
        df=result_pl,
        on_conflict="update",
        bazefield_upload=upload_to_bazefield,
    )