Skip to content

Expression Evaluation

Overview

The FeatureCalcEvalExpression class is a subclass of FeatureCalculator that evaluates a string containing lines of code that can be used to calculate a feature value. This is useful when we want to define a calculation that is not extremely complex and would need it's specific class, but can be defined in a simple expression stored in the database.

Calculation Logic

All the calculation that is done by this class should be defined in the feature_eval_expression attribute of the desired feature. This string should meet the following requirements:

  • Be a valid Python code.
    • If specific libraries are needed, they should be imported at the beginning of the expression and must be installed in the performance server Airflow docker container.
    • Pandas is already available, so you can just call it's methods using pd.method_name.
  • Use a variable called df that is a pandas DataFrame containing the data that will be used for the calculations.
    • All DataFrame columns that are used in the expression should be accessed using df["column_name"], using double quotes as a string delimiter.
  • At the end of the expression, df should contain a column with the name of the feature that is being calculated, which will be used as the result of the calculation.

The code will look for all the features in df so it knows how to populate the DataFrame before the expression evaluation. This will be done by the get_features_in_expression method from echo-postgres.

Note

If you need to create intermediate features that should not be requested to the database before evaluation (useful for intermediate calculations), you can use columns that contain temporary_feature in their name. These columns will be ignored by get_features_in_expression and will not be requested to the database.

Also, you can get data from bazefield database adding a "_b#" expression at the end of the feature name. For example: df[\"ActivePower_5min.AVG_b#\"] will get the tag ActivePower_5min.AVG from bazefield object. You can follow the same sintax as if it was an echo-postgres feature, these columns will be ignored by get_features_in_expression and will not be requested to the database. It will be later on requested to bazefield database on 'get_data' from 'RequiredFeatures' class.

Also, object attributes can be used in the expression by enclosing then in $${}$$ (like $$hub_height$$). The replace_object_attributes_in_expression method from echo-postgres will replace these placeholders with the actual values.

Example

The expression below is used to calculate feature reactive_energy, which is the difference between the reactive energy delivered and the reactive energy received.

df["reactive_energy"] = df["reactive_energy_delivered"] - df["reactive_energy_received"]

During code execution, the code will determine that it needs to populate the DataFrame with the features reactive_energy_delivered and reactive_energy_received before evaluating the expression. At the end of the evaluation, the DataFrame will contain a column with the name reactive_energy that will be used as the result of the calculation.

As another example that uses object attributes, the expression below is used to calculate feature curtailment_state. Considering that this is used to a V90-3.0 wind turbine, $$nominal_power$$ will be replaced by 3000.0 during the evaluation.

df["curtailment_state"] = 0.0 # no curtailment
df.loc[df[(df["grd_sets_actpwr_minreferencevalue10min"] < $$nominal_power$$ * 0.999) & (df["grd_prod_pwr_internalderatestat"] > 0.0)].index, "curtailment_state"] = 4.0 # other deratings
df.loc[df[(df["hcnt_gen1"] == 600.0) & (df["grd_sets_actpwr_minreferencevalue10min"] < $$nominal_power$$ * 0.99)].index, "curtailment_state"] = 4.0 # other deratings
df.loc[df[(df["grd_sets_actpwr_minreferencevalue10min"] < $$nominal_power$$ * 0.999) & (df["grd_prod_pwr_internalderatestat"].isin([1.0, 2.0, 3.0, 4.0, 23.0, 24.0, 29.0, 30.0, 31.0, 32.0, 35.0, 40.0, 41.0]))].index, "curtailment_state"] = 5.0 # component temperature derating
df.loc[df[df["prod_prodmanager_pwrderatereference_min"] < $$nominal_power$$ * 0.99].index, "curtailment_state"] = 4.0 # other deratings
df.loc[df[(df["grd_sets_actpwr_minreferencevalue10min"] < $$nominal_power$$ * 0.999) & ((df["grd_prod_pwr_internalderatestat"] == 18.0) | ((df["sys_logs_firstactalarmno"].isin([4604.0, 309.0])) & (df["sys_logs_firstactalarmpar1"] == 2)))].index, "curtailment_state"] = 1.0 # park curtailment
df.loc[df[df["grd_prod_pwr_internalderatestat"] == 19.0].index, "curtailment_state"] = 3.0 # ambient temperature derating
df.loc[df[(df[["prod_prodmanager_actpwrreference", "grd_sets_actpwr_maxreferencevalue10min"]].max(axis=1, skipna=False) < $$nominal_power$$ * 0.99) & (df["grd_prod_pwr_internalderatestat"] != 18.0)].index, "curtailment_state"] = 2.0 # fixed power curtailment
df.loc[df[df["iec_operation_state"] <= 1.0].index, "curtailment_state"] = 0.0 # turbine stopped

Tip

When saving the expression to the database double quotes should be escaped with a backslash (\), like \" and line breaks should be replaced by \n. To quickly convert a python string to this format, you can use the VSCode extension Replace Rules and add the following configuration to your settings.json.

"replacerules.rules": {
    "Remove trailing and leading whitespace": {
        "find": "^\\s*(.*)\\s*$",
        "replace": "$1"
    },
    "Remove blank lines": {
        "find": "^\\n",
        "replace": ""
    },
    "Change \\n to new line": {
        "find": "\\\\n",
        "replace": "\n"
    },
    "Change \\\" to \"": {
        "find": "\\\\\"",
        "replace": "\""
    },
    "Remove \" from start and end of line": {
        "find": "^['\\\\\"](.*)['\\\\\"]$",
        "replace": "$1"
    },
    "Change new line to \\n": {
        "find": "\n",
        "replace": "\\n"
    },
    "Change \"to  \\\"": {
        "find": "\"",
        "replace": "\\\""
    },
    "Add \" marks to start and end of line": {
        "find": "^(.*)$",
        "replace": "\"$1\""
    }
},
"replacerules.rulesets": {
    "String to Python": {
        "rules": ["Remove \" from start and end of line", "Change \\n to new line", "Change \\\" to \""]
    },
    "Python to String": {
        "rules": ["Change \"to  \\\"", "Change new line to \\n", "Add \" marks to start and end of line"]
    }
},

Then you can select the text you want to convert and use the command Replace Rules: Apply Ruleset and select the String to Python or Python to String rulesets to convert the text back and forth.

Database Requirements

  • Feature attribute server_calc_type must be set to expression_evaluation.
  • Feature attribute feature_eval_expression must be set to a valid Python code that calculates the feature value.

Class Definition

FeatureCalcEvalExpression(object_name, feature)

FeatureCalculator class for features that rely on evaluation of expressions saved as strings.

These expressions must be saved in the feature_eval_expression attribute of the feature in performance_db.

Here we assumed that the expressions use df as the DataFrame with the needed features.

The features used in df will be requested for the desired period by the method. The method get_features_in_expression from echo_postgres will be used to define which are the features in the expression. Features that contain temporary_feature in the name will be ignored and can be used as auxiliary columns in the DataFrame for helping in the calculations.

As well, object attributes will be replaced in the expression by the method replace_object_attributes_in_expression from echo_postgres. The attributes must be enclosed in $${}$$ like $$hub_height$$.

At the end of the expression, df must have a column with the name of the feature that is being calculated and this column will be returned as the result of the calculation.

Parameters:

  • object_name

    (str) –

    Name of the object for which the feature is calculated. It must exist in performance_db.

  • feature

    (str) –

    Feature of the object that is calculated. It must exist in performance_db.

Source code in echo_energycalc/feature_calc_common.py
def __init__(
    self,
    object_name: str,
    feature: str,
) -> None:
    """
    FeatureCalculator class for features that rely on evaluation of expressions saved as strings.

    These expressions must be saved in the `feature_eval_expression` attribute of the feature in performance_db.

    Here we assumed that the expressions use `df` as the DataFrame with the needed features.

    The features used in `df` will be requested for the desired period by the method. The method `get_features_in_expression` from echo_postgres will be used to define which are the features in the expression. Features that contain `temporary_feature` in the name will be ignored and can be used as auxiliary columns in the DataFrame for helping in the calculations.

    As well, object attributes will be replaced in the expression by the method `replace_object_attributes_in_expression` from echo_postgres. The attributes must be enclosed in `$${}$$` like `$$hub_height$$`.

    At the end of the expression, `df` must have a column with the name of the feature that is being calculated and this column will be returned as the result of the calculation.

    Parameters
    ----------
    object_name : str
        Name of the object for which the feature is calculated. It must exist in performance_db.
    feature : str
        Feature of the object that is calculated. It must exist in performance_db.
    """
    # initialize parent class
    super().__init__(object_name, feature)

    # requirements for the feature calculator
    self._add_requirement(RequiredFeatureAttributes(self.object, self.feature, ["feature_eval_expression"]))
    self._get_required_data()

feature property

Feature that is calculated. This will be defined in the constructor and cannot be changed.

Returns:

  • str

    Name of the feature that is calculated.

name property

Name of the feature calculator. Is defined in child classes of FeatureCalculator.

This must be equal to the "server_calc_type" attribute of the feature in performance_db.

Returns:

  • str

    Name of the feature calculator.

object property

Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.

Returns:

  • str

    Object name for which the feature is calculated.

requirements property

List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.

Returns:

  • dict[str, list[CalculationRequirement]]

    Dict of requirements.

    The keys are the names of the classes of the requirements and the values are lists of requirements of that class.

    For example: {"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}

result property

Result of the calculation. This is None until the method "calculate" is called.

Returns:

  • Series | DataFrame | None:

    Result of the calculation if the method "calculate" was called. None otherwise.

calculate(period, save_into=None, cached_data=None, **kwargs)

Method that will calculate the Eval Expression feature.

Parameters:

  • period

    (DateTimeRange) –

    Period for which the feature will be calculated.

  • save_into

    (Literal['all', 'performance_db'] | None, default: None ) –

    Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

    By default None.

  • cached_data

    (DataFrame | None, default: None ) –

    DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient. By default None

  • **kwargs

    (dict, default: {} ) –

    Additional arguments that will be passed to the "save" method.

Returns:

  • Series

    Pandas Series with the calculated feature.

Source code in echo_energycalc/feature_calc_common.py
def calculate(
    self,
    period: DateTimeRange,
    save_into: Literal["all", "performance_db"] | None = None,
    cached_data: DataFrame | None = None,
    **kwargs,
) -> Series:
    """
    Method that will calculate the Eval Expression feature.

    Parameters
    ----------
    period : DateTimeRange
        Period for which the feature will be calculated.
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    cached_data : DataFrame | None, optional
        DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient.
        By default None
    **kwargs : dict, optional
        Additional arguments that will be passed to the "save" method.

    Returns
    -------
    Series
        Pandas Series with the calculated feature.
    """
    eval_expression = self._get_requirement_data("RequiredFeatureAttributes")[self.feature]["feature_eval_expression"]

    # getting needed features
    required_features = get_features_in_expression(eval_expression, self.object, self._perfdb)

    # replacing object attributes in expression
    eval_expression = replace_object_attributes_in_expression(eval_expression, self.object, self._perfdb)

    # adding requirements
    self._add_requirement(RequiredFeatures(features={self.object: required_features}))

    # defining round timestamps (if a bazefield tag is required)
    round_timestamps = (
        {"freq": timedelta(minutes=5), "tolerance": timedelta(minutes=2)}
        if any(feature.endswith("_b#") for feature in required_features)
        else None
    )

    self._get_required_data(period=period, cached_data=cached_data, only_missing=True, round_timestamps=round_timestamps)

    # saving needed features into df as expressions assumes we have a DataFrame called "df" to do all the calculations
    df = self._get_requirement_data("RequiredFeatures").droplevel(0, axis=1).copy()

    # calculating by evaluating the expression
    loc = {"df": df}
    exec(eval_expression, globals(), loc)  # pylint: disable=exec-used  # noqa: S102
    df = loc["df"]

    # adding calculated feature to class result attribute
    self._result = df[self.feature].copy()

    # saving results
    self.save(save_into=save_into, **kwargs)

    return df[self.feature]

save(save_into=None, **kwargs)

Method to save the calculated feature values in performance_db.

Parameters:

  • save_into

    (Literal['all', 'performance_db'] | None, default: None ) –

    Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

    By default None.

  • **kwargs

    (dict, default: {} ) –

    Not being used at the moment. Here only for compatibility.

Source code in echo_energycalc/feature_calc_core.py
def save(
    self,
    save_into: Literal["all", "performance_db"] | None = None,
    **kwargs,  # noqa: ARG002
) -> None:
    """
    Method to save the calculated feature values in performance_db.

    Parameters
    ----------
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    **kwargs : dict, optional
        Not being used at the moment. Here only for compatibility.
    """
    # checking arguments
    if not isinstance(save_into, str | type(None)):
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
    if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
        raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")

    # checking if calculation was done
    if self.result is None:
        raise ValueError(
            "The calculation was not done. Cannot save the feature calculation results. Please make sure to do something like 'self._result = df[self.feature].copy()' in the method 'calculate' before calling 'self.save()'.",
        )

    if save_into is None:
        return

    if isinstance(save_into, str):
        if save_into not in ["performance_db", "all"]:
            raise ValueError(f"save_into must be 'performance_db' or 'all', not {save_into}.")
        upload_to_bazefield = save_into == "all"
    elif save_into is None:
        upload_to_bazefield = False
    else:
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}.")

    # converting result series to DataFrame if needed
    if isinstance(self.result, Series):
        result_df = self.result.to_frame()
    elif isinstance(self.result, DataFrame):
        result_df = self.result.droplevel(0, axis=1)
    else:
        raise TypeError(f"result must be a pandas Series or DataFrame, not {type(self.result)}.")

    # adjusting DataFrame to be inserted in the database
    # making the columns a Multindex with levels object_name and feature_name
    result_df.columns = MultiIndex.from_product([[self.object], result_df.columns], names=["object_name", "feature_name"])

    self._perfdb.features.values.series.insert(
        df=result_df,
        on_conflict="update",
        bazefield_upload=upload_to_bazefield,
    )