Expression Evaluation¶
Overview¶
FeatureCalcEvalExpression evaluates a Python code string stored in the database to calculate a feature. This is useful for one-off, moderately complex calculations that don't justify a dedicated class — the logic lives in the database alongside the feature definition, not in Python source code.
Engine Selection¶
The first line of the expression controls which DataFrame type is exposed as df:
| First line | df type |
Column naming |
|---|---|---|
# df_type=polars |
polars.DataFrame |
Bare feature names ("ActivePower_10min.AVG") |
# df_type=polars_multiple_objs |
polars.DataFrame |
"object@feature" flat columns |
# df_type=pandas |
pandas.DataFrame |
Bare feature names, DatetimeIndex |
# df_type=pandas_multiple_objs |
pandas.DataFrame |
MultiIndex (object, feature), DatetimeIndex |
| (omitted or other comment) | pandas.DataFrame |
Bare feature names (legacy default) |
Choose the Polars engines for better performance. Use multiple_objs variants when the expression needs features from more than one object simultaneously.
Calculation Logic¶
The expression code stored in feature_eval_expression must:
- Use
dfas the variable holding the input data (see engine table above). - At the end of the expression,
dfmust contain a column named exactlyself.feature(the feature being calculated) — this column is extracted as the result. - Be valid Python code that can be executed in a sandboxed namespace.
Available namespaces¶
The following names are always available inside an expression:
pd— pandasnp— numpyinterp1d—scipy.interpolate.interp1dfilters—echo_energycalc.filtersmoduleDataFrame,Series,Timestamp— from pandaspl/polars— Polars (only inpolars*engines)- Standard builtins:
abs,all,any,bool,dict,enumerate,float,int,isinstance,len,list,map,max,min,print,range,round,set,slice,sorted,str,sum,tuple,type,zip,None,True,False
Warning
Dangerous builtins (__import__, open, exec, eval, compile, getattr) are blocked in the sandbox.
Feature Discovery¶
The calculator uses get_features_in_expression from echo-postgres to automatically identify which features appear in df[...] access patterns in the expression. These features are then requested from the database for the calculation period.
Note
Columns whose name contains temporary_feature are excluded from auto-discovery and will not be fetched from the database. Use these for intermediate calculation columns.
Bazefield Features¶
Append _b# to any feature name to fetch it from Bazefield instead of performance_db:
df["ActivePower_5min.AVG_b#"] # fetches ActivePower_5min.AVG from Bazefield
These columns are auto-aligned to the nearest 10-minute timestamp (within ±2 minutes).
Object Attributes in Expressions¶
Enclose object attribute names in $${}$$ to substitute their values at evaluation time:
df["curtailment_loss"] = df["lost_power"] * $$efficiency_factor$$
replace_object_attributes_in_expression from echo-postgres replaces $$efficiency_factor$$ with the actual numeric value from the database before the expression is evaluated.
Examples¶
Simple expression (pandas, default engine)¶
df["reactive_energy"] = df["reactive_energy_delivered"] - df["reactive_energy_received"]
Using object attributes¶
df["curtailment_state"] = 0.0
df.loc[df["active_power"] < $$nominal_power$$ * 0.99, "curtailment_state"] = 1.0
Polars engine¶
# df_type=polars
df = df.with_columns(
(pl.col("reactive_energy_delivered") - pl.col("reactive_energy_received"))
.alias("reactive_energy")
)
Multiple objects with Polars¶
# df_type=polars_multiple_objs
df = df.with_columns(
(pl.col("WT01@ActivePower_10min.AVG") + pl.col("WT02@ActivePower_10min.AVG"))
.alias("total_power")
)
Database Requirements¶
- Feature attribute
server_calc_typemust be set toexpression_evaluation. - Feature attribute
feature_eval_expressionmust contain the Python expression code.
Tips¶
Tip
When saving expressions to the database, double quotes must be escaped (\") and line breaks replaced with \n. The VSCode extension Replace Rules can automate this with the following configuration in settings.json:
"replacerules.rules": {
"Change new line to \\n": { "find": "\n", "replace": "\\n" },
"Change \\n to new line": { "find": "\\\\n", "replace": "\n" },
"Change \"to \\\"": { "find": "\"", "replace": "\\\"" },
"Change \\\" to \"": { "find": "\\\\\"", "replace": "\"" }
},
"replacerules.rulesets": {
"Python to String": { "rules": ["Change \"to \\\"", "Change new line to \\n"] },
"String to Python": { "rules": ["Change \\n to new line", "Change \\\" to \""] }
}
Class Definition¶
FeatureCalcEvalExpression(object_name, feature)
¶
FeatureCalculator that evaluates a Python expression string stored in performance_db.
The expression is read from the feature_eval_expression attribute of the
feature. It is executed in a sandboxed namespace (see :data:_SAFE_BUILTINS
and :data:_EXPRESSION_GLOBALS) and must produce a column named after the
target feature in the variable df.
Engine selection
The first line of the expression string selects the DataFrame type
exposed as df:
========================================== ===================================== ==================================
First line df type Column naming
========================================== ===================================== ==================================
# df_type=polars polars.DataFrame Bare feature names + "timestamp"
# df_type=polars_multiple_objs polars.DataFrame "object@feature" flat names
# df_type=pandas pandas.DataFrame (DatetimeIndex) Bare feature names
# df_type=pandas_multiple_objs pandas.DataFrame (DatetimeIndex) MultiIndex (object, feature)
(omitted or unrecognised comment) pandas.DataFrame (DatetimeIndex) Bare feature names (legacy default)
========================================== ===================================== ==================================
Feature discovery
Features referenced inside df["..."] patterns are automatically
discovered by get_features_in_expression from echo-postgres and
requested for the calculation period. Columns whose name contains
"temporary_feature" are excluded from auto-discovery (useful for
intermediate calculation variables).
Object attribute substitution
Attribute placeholders in the form $$attr_name$$ are replaced with
their actual values before evaluation via
replace_object_attributes_in_expression from echo-postgres.
Bazefield features
Appending _b# to any feature name fetches it from Bazefield instead
of performance_db (e.g. df["ActivePower_5min.AVG_b#"]).
These columns are excluded from echo-postgres auto-discovery and are
aligned to the nearest 10-minute timestamp within ±2 minutes.
These expressions must be saved in the feature_eval_expression attribute of the feature in performance_db.
Here we assumed that the expressions use df as the DataFrame with the needed features.
The features used in df will be requested for the desired period by the method. The method get_features_in_expression from echo_postgres will be used to define which are the features in the expression. Features that contain temporary_feature in the name will be ignored and can be used as auxiliary columns in the DataFrame for helping in the calculations.
As well, object attributes will be replaced in the expression by the method replace_object_attributes_in_expression from echo_postgres. The attributes must be enclosed in $${}$$ like $$hub_height$$.
At the end of the expression, df must have a column with the name of the feature that is being calculated and this column will be returned as the result of the calculation.
Parameters:
-
(object_name¶str) –Name of the object for which the feature is calculated. It must exist in performance_db.
-
(feature¶str) –Feature of the object that is calculated. It must exist in performance_db.
Source code in echo_energycalc/feature_calc_eval_expression.py
def __init__(
self,
object_name: str,
feature: str,
) -> None:
"""
FeatureCalculator class for features that rely on evaluation of expressions saved as strings.
These expressions must be saved in the `feature_eval_expression` attribute of the feature in performance_db.
Here we assumed that the expressions use `df` as the DataFrame with the needed features.
The features used in `df` will be requested for the desired period by the method. The method `get_features_in_expression` from echo_postgres will be used to define which are the features in the expression. Features that contain `temporary_feature` in the name will be ignored and can be used as auxiliary columns in the DataFrame for helping in the calculations.
As well, object attributes will be replaced in the expression by the method `replace_object_attributes_in_expression` from echo_postgres. The attributes must be enclosed in `$${}$$` like `$$hub_height$$`.
At the end of the expression, `df` must have a column with the name of the feature that is being calculated and this column will be returned as the result of the calculation.
Parameters
----------
object_name : str
Name of the object for which the feature is calculated. It must exist in performance_db.
feature : str
Feature of the object that is calculated. It must exist in performance_db.
"""
# initialize parent class
super().__init__(object_name, feature)
# requirements for the feature calculator
self._add_requirement(RequiredFeatureAttributes(self.object, self.feature, ["feature_eval_expression"]))
self._fetch_requirements()
# Pre-resolve expression metadata here (main thread) so that _compute
# — which runs inside a ThreadPoolExecutor worker — never calls echo-postgres
# directly. Both functions below hit echo-postgres internally and deadlock
# under concurrent access even with separate PerfDB instances.
_eval_expression = self._requirement_data("RequiredFeatureAttributes")[self.feature]["feature_eval_expression"]
self._required_features: list[str] = get_features_in_expression(_eval_expression, self.object, self._perfdb)
self._eval_expression_resolved: str = replace_object_attributes_in_expression(
_eval_expression, self.object, self._perfdb
)
self._round_timestamps: dict | None = (
{"freq": timedelta(minutes=5), "tolerance": timedelta(minutes=2)}
if any(feat.endswith("_b#") for feat in self._required_features)
else None
)
# Register the RequiredFeatures dependency now that we know which features the
# expression references. _fetch_requirements(period=...) in _compute will
# populate the data for the correct period without re-querying the metadata.
self._add_requirement(RequiredFeatures(features={self.object: self._required_features}))
feature
property
¶
Feature that is calculated. This will be defined in the constructor and cannot be changed.
Returns:
-
str–Name of the feature that is calculated.
name
property
¶
Name of the feature calculator. Is defined in child classes of FeatureCalculator.
This must be equal to the "server_calc_type" attribute of the feature in performance_db.
Returns:
-
str–Name of the feature calculator.
object
property
¶
Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.
Returns:
-
str–Object name for which the feature is calculated.
requirements
property
¶
List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.
Returns:
-
dict[str, list[CalculationRequirement]]–Dict of requirements.
The keys are the names of the classes of the requirements and the values are lists of requirements of that class.
For example:
{"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}
result
property
¶
Result of the calculation. This is None until the method "calculate" is called.
Returns:
-
DataFrame | None–Polars DataFrame with a
"timestamp"column and one or more feature value columns. None untilcalculateis called.
calculate(period, save_into=None, cached_data=None, **kwargs)
¶
Run the calculation for the given period and optionally save the result.
Calls :meth:_compute to get the result, stores it in :attr:result,
then calls :meth:save. Subclasses should implement :meth:_compute instead
of overriding this method.
Parameters:
-
(period¶DateTimeRange) –Period for which the feature will be calculated.
-
(save_into¶Literal['all', 'performance_db'] | None, default:None) –"all": save in performance_db and bazefield."performance_db": save only in performance_db.None: do not save.
By default None.
-
(cached_data¶DataFrame | None, default:None) –Polars DataFrame with features already fetched/calculated. Passed to
_computeto enable chained calculations without re-querying performance_db. By default None. -
–**kwargs¶Forwarded to :meth:
save.
Returns:
-
DataFrame–Polars DataFrame with a
"timestamp"column and one or more feature value columns.
Source code in echo_energycalc/feature_calc_core.py
def calculate(
self,
period: DateTimeRange,
save_into: Literal["all", "performance_db"] | None = None,
cached_data: pl.DataFrame | None = None,
**kwargs,
) -> pl.DataFrame:
"""
Run the calculation for the given period and optionally save the result.
Calls :meth:`_compute` to get the result, stores it in :attr:`result`,
then calls :meth:`save`. Subclasses should implement :meth:`_compute` instead
of overriding this method.
Parameters
----------
period : DateTimeRange
Period for which the feature will be calculated.
save_into : Literal["all", "performance_db"] | None, optional
- ``"all"``: save in performance_db and bazefield.
- ``"performance_db"``: save only in performance_db.
- ``None``: do not save.
By default None.
cached_data : pl.DataFrame | None, optional
Polars DataFrame with features already fetched/calculated. Passed to
``_compute`` to enable chained calculations without re-querying
performance_db. By default None.
**kwargs
Forwarded to :meth:`save`.
Returns
-------
pl.DataFrame
Polars DataFrame with a ``"timestamp"`` column and one or more feature value columns.
"""
result = self._compute(period, cached_data=cached_data)
self._result = result
self.save(save_into=save_into, **kwargs)
return result
save(save_into=None, **kwargs)
¶
Method to save the calculated feature values in performance_db.
Parameters:
-
(save_into¶Literal['all', 'performance_db'] | None, default:None) –Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.
By default None.
-
(**kwargs¶dict, default:{}) –Not being used at the moment. Here only for compatibility.
Source code in echo_energycalc/feature_calc_core.py
def save(
self,
save_into: Literal["all", "performance_db"] | None = None,
**kwargs, # noqa: ARG002
) -> None:
"""
Method to save the calculated feature values in performance_db.
Parameters
----------
save_into : Literal["all", "performance_db"] | None, optional
Argument that will be passed to the method "save". The options are:
- "all": The feature will be saved in performance_db and bazefield.
- "performance_db": the feature will be saved only in performance_db.
- None: The feature will not be saved.
By default None.
**kwargs : dict, optional
Not being used at the moment. Here only for compatibility.
"""
# checking arguments
if not isinstance(save_into, str | type(None)):
raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")
# checking if calculation was done
if self.result is None:
raise ValueError(
"The calculation was not done. Please call 'calculate' before calling 'save'.",
)
if save_into is None:
return
upload_to_bazefield = save_into == "all"
if not isinstance(self.result, pl.DataFrame):
raise TypeError(f"result must be a polars DataFrame, not {type(self.result)}.")
if "timestamp" not in self.result.columns:
raise ValueError("result DataFrame must contain a 'timestamp' column.")
# rename feature columns to "object@feature" format expected by perfdb polars insert
feat_cols = [c for c in self.result.columns if c != "timestamp"]
result_pl = self.result.rename({col: f"{self.object}@{col}" for col in feat_cols})
self._perfdb.features.values.series.insert(
df=result_pl,
on_conflict="update",
bazefield_upload=upload_to_bazefield,
)