Skip to content

Feature Calculator

Overview

The FeatureCalculator class is an abstract base class that defines the interface for all feature calculators. It is designed to calculate a feature for a given object.

Usage

The feature calculators will be used throughout all feature calculations with some variations, but as a general rule, this is what is done:

  1. Instantiate the calculator with the necessary arguments and requirements, including the object for which the feature will be calculated. This will validate if the object exists, if the feature exists for the object, and if the name of the feature calculator matches the server_calc_type feature attribute in the database.
  2. Calculate the results using the calculate method.
    1. Get and validate all the necessary requirements with the _get_required_data method.
    2. Create a Series or DataFrame with the results of the calculation using the _create_empty_result method.
    3. Run the calculation logic.
    4. Save the results in the database with the save method, if applicable.

Subclass implementation

We will use the FeatureCalcExample class as an example of how to implement a subclass of FeatureCalculator.

In general terms Subclasses of FeatureCalculator must have the following requirements:

  • _name class attribute: The name of the feature calculator. This should match the server_calc_type attribute of the feature in the database.
  • __init__: The constructor of the class. It should override the constructor of the superclass but keeping the exact same arguments. The constructor will be used to define the requirements of the calculation and validate them. In the example below, we are adding a requirement for the object attribute nominal_power using the _add_requirement method and then validating this requirement and getting it's value with the _get_required_data method.

    def __init__(
        self,
        object_name: str,
        feature: str,
    ) -> None:
        """
        Example of a FeatureCalculator class.
        Currently this feature calculator is only set up for feature "example_calc" of "wind_turbine" of model "GE100-1.X".
        For this calculation to work, the following object attributes must be defined:
            - "nominal_power" (in kW)
        Parameters
        ----------
        object_name : str
            Name of the object to calculate the feature for.
        feature : str
            Name of the feature to calculate.
        """
        # initialize parent class
        # this will set the object name and feature using the constructor of the parent class
        super().__init__(object_name, feature)
        # defining what are the requirements
        # this is just a simple example of how to use the RequiredObjectAttributes class, for other requirement options see calculation_requirements.py
        # in this we are just creating the requirement, no checking if it is fulfilled is done here, it will only be done when calling the _get_required_data method
        self._add_requirement(RequiredObjectAttributes({self.object: ["nominal_power"]}))
        # getting the requirement values
        # in here we are checking if the requirements are fulfilled and getting their values
        # as in this example we are just using the RequiredObjectAttributes class we don`t need to define a period, but if we were using other requirement classes (like RequiredFeatures) we would need to define a period
        self._get_required_data()
    

    Note

    If the calculation depends on data from other features, you should just add the RequiredFeatures requirement to the calculator at the end of the __init__ method and then call the _get_required_data method in the calculate method. This is done because the RequiredFeatures requirement needs is specific for a predefined period, and the period is only defined when calling the calculate method.

  • calculate: This method should define and run the calculation logic. In general terms it should do the following:

    1. In case there are requirements like RequiredFeatures that depend on a period, call the _get_required_data method to get the data.

      Note

      Requirements can be added on the fly if needed. This is useful if your calculation initially depends on some data but if it is not available you get another data to calculate the feature.

    2. Create a Series or DataFrame object to store the results of the calculation using the _create_empty_result method.

    3. Do all the calculations needed.
    4. Save the results to the _result attribute for later access.
    5. Save the results in the database with the save method by calling self.save(save_into=save_into, **kwargs).

    The following is an example of the calculate method for the FeatureCalcExample class, which is very simple as it just calculates a feature with the same value (nominal power) for all timestamps:

    def calculate(
        self,
        period: DateTimeRange,
        save_into: Literal["all", "performance_db"] | None = None,
        cached_data: DataFrame | None = None,  # noqa: ARG002
        **kwargs,
    ) -> Series:
        """
        Method that will calculate the feature.
    
        Parameters
        ----------
        period : DateTimeRange
            Period for which the feature will be calculated.
        save_into : Literal["all", "performance_db"] | None, optional
            Argument that will be passed to the method "save". The options are:
            - "all": The feature will be saved in performance_db and bazefield.
            - "performance_db": the feature will be saved only in performance_db.
            - None: The feature will not be saved.
    
            By default None.
        cached_data : DataFrame | None, optional
            DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient.
            By default None
        **kwargs : dict, optional
            Additional arguments that will be passed to the "save" method.
    
        Returns
        -------
        Series
            Series with the calculated feature.
        """
        # creating Series to store results
        # this is not a necessary step, but is useful for already creating a Series with the correct index and name expected by the _save method
        result = self._create_empty_result(period=period, result_type="Series", freq="10min")
    
        # getting nominal power
        # in here we are getting the nominal power that was stored in the object attributes when calling the _get_required_data method
        # the _get_requirement_data method will return a dict or a DataFrame depending on the type of requirement
        nominal_power = self._get_requirement_data("RequiredObjectAttributes")[self.object]["nominal_power"]
    
        # here we are just filling the result Series with the nominal power
        # this example calculation just creates a feature with the same value for all timestamps
        result.loc[:] = nominal_power
    
        # adding calculated feature to class result attribute
        # this is necessary so that the _save method can access the calculated feature
        # if not done the code will not save anything in performance_db, bazefield, etc.
        self._result = result.copy()
    
        # saving results
        # here we are just calling the _save method which gets the series saved in the _result attribute and saves it in performance_db and bazefield
        # this is done like this because we can opt for not saving the results right now (using save_into=None, which is the default), so the .save method is called later on in the calculation_handler.py
        self.save(save_into=save_into, **kwargs)
    
        return result
    

Database Requirements

All FeatureCalculator subclasses must have a name that matches the server_calc_type attribute of the feature in the database. So for example, if we have a feature in the database with the server_calc_type attribute set to example_calc, the FeatureCalculator subclass used to calculate this feature will be the FeatureCalcExample class because it has the _name attribute set to example_calc. In other words, the server_calc_type attribute in the database defines which FeatureCalculator subclass will be used to calculate the feature.

Also, the mapping from the server_calc_type attribute to the FeatureCalculator subclass is done in the constants.py file in the echo_energycalc package. Please add any new FeatureCalculator subclass to this mapping.

Class Definition

FeatureCalculator(object_name, feature)

Base abstract class for feature calculators. Already defines the methods and attributes that are common to all feature calculators.

This should be overloaded in child classes of FeatureCalculator to define the name, requirements and logic of the feature calculator.

It's extremely important to define the name of the feature calculator in the child class. This name must be equal to the "server_calc_type" attribute of the feature in performance_db and will be used to check if the feature calculator is the correct one for the feature. Set the name of the feature calculator in the child class as a class attribute "_name".

Below there is a simple example on how to define a child class of FeatureCalculator:

class FeatureCalculatorExample(FeatureCalculator):
    # name of the feature calculator
    _name = "name"

    def __init__(self, object_name: str, feature: str) -> None:
        # initialize parent class
        super().__init__(object_name, feature)

        # requirements for the feature calculator
        self.add_requirement(...)

Parameters:

  • object_name

    (str) –

    Name of the object for which the feature is calculated. It must exist in performance_db.

  • feature

    (str) –

    Feature of the object that is calculated. It must exist in performance_db.

Source code in echo_energycalc/feature_calc_core.py
def __init__(
    self,
    object_name: str,
    feature: str,
) -> None:
    """
    Constructor of the FeatureCalculator class.

    This should be overloaded in child classes of FeatureCalculator to define the name, requirements and logic of the feature calculator.

    It's extremely important to define the name of the feature calculator in the child class. This name must be equal to the "server_calc_type" attribute of the feature in performance_db and will be used to check if the feature calculator is the correct one for the feature. Set the name of the feature calculator in the child class as a class attribute "_name".

    Below there is a simple example on how to define a child class of FeatureCalculator:

    ```python
    class FeatureCalculatorExample(FeatureCalculator):
        # name of the feature calculator
        _name = "name"

        def __init__(self, object_name: str, feature: str) -> None:
            # initialize parent class
            super().__init__(object_name, feature)

            # requirements for the feature calculator
            self.add_requirement(...)
    ```

    Parameters
    ----------
    object_name : str
        Name of the object for which the feature is calculated. It must exist in performance_db.
    feature : str
        Feature of the object that is calculated. It must exist in performance_db.
    """
    # checking arguments
    if not isinstance(object_name, str):
        raise TypeError(f"object_name must be a string, not {type(object_name)}")
    if not isinstance(feature, str):
        raise TypeError(f"feature must be a string, not {type(feature)}")

    # check if self._name is defined in child class
    if not hasattr(self, "_name"):
        raise ValueError(f"FeatureCalculator name is not defined in {self.__class__.__name__}.")

    # empty set of requirements
    self._requirements = None

    # creating structure that will be used to connect to performance_db
    self._perfdb = PerfDB(application_name=self.__class__.__name__)

    # check if object exists in performance_db
    obj_def = self._perfdb.objects.instances.get(object_names=[object_name], output_type="dict")
    if len(obj_def) == 0:
        raise ValueError(f"Object {object_name} does not exist in performance_db.")
    self._object = object_name
    obj_model = obj_def[object_name]["object_model_name"]

    # check if feature exists in performance_db
    obj_features = self._perfdb.features.definitions.get(
        object_names=[object_name],
        feature_names=[feature],
        get_attributes=True,
        attribute_names=["server_calc_type"],
        output_type="dict",
    )
    if len(obj_features) != 1 or len(obj_features[obj_model]) != 1:
        raise ValueError(f"Feature {feature} does not exist in performance_db for object {object_name}.")
    self._feature = feature

    feature_def = obj_features[obj_model][feature]

    # checking if this calculation name is equal to the "server_calc_type" attribute of the feature
    if "server_calc_type" not in feature_def:
        raise ValueError(
            f"Feature {feature} does not have the attribute 'server_calc_type' in performance_db for object {object_name}.",
        )
    if feature_def["server_calc_type"] != self.name:
        raise ValueError(
            f"Feature '{feature}' of object '{object_name}' has the attribute 'server_calc_type' equal to '{feature_def['server_calc_type']}' in performance_db, which is different from the name of the class {self.__class__.__name__}('{self.name}').",
        )

    # results of the calculation. It will be filled by the method "calculate".
    self._result: Series | DataFrame | None = None

feature property

Feature that is calculated. This will be defined in the constructor and cannot be changed.

Returns:

  • str

    Name of the feature that is calculated.

name property

Name of the feature calculator. Is defined in child classes of FeatureCalculator.

This must be equal to the "server_calc_type" attribute of the feature in performance_db.

Returns:

  • str

    Name of the feature calculator.

object property

Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.

Returns:

  • str

    Object name for which the feature is calculated.

requirements property

List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.

Returns:

  • dict[str, list[CalculationRequirement]]

    Dict of requirements.

    The keys are the names of the classes of the requirements and the values are lists of requirements of that class.

    For example: {"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}

result property

Result of the calculation. This is None until the method "calculate" is called.

Returns:

  • Series | DataFrame | None:

    Result of the calculation if the method "calculate" was called. None otherwise.

calculate(period, save_into=None, cached_data=None, **kwargs) abstractmethod

Abstract method that should be implemented in child classes. Should calculate the feature for the given object and period.

This method should call the method _get_required_data to get the required data before calculating the feature.

The method also should call the method "save" to save the calculated feature in performance_db.

At the end of the method, the attribute "_result" should be filled with the result of the calculation.

Parameters:

  • period

    (DateTimeRange) –

    Period for which the feature will be calculated.

  • save_into

    (Literal['all', 'performance_db'] | None, default: None ) –

    Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

    By default None.

  • cached_data

    (DataFrame | None, default: None ) –

    DataFrame with the cached data for each object. This needs to be passed down to the _get_required_data method to do chained calculations without needing to save the data in performance_db and get them again. By default None.

  • **kwargs

    (dict, default: {} ) –

    Additional arguments that will be passed to the "save" method.

Returns:

  • Series | DataFrame

    Pandas Series with the calculated feature.

    If multiple features are calculated in the method, a DataFrame should be returned with two column levels (object_name, feature_name).

Source code in echo_energycalc/feature_calc_core.py
@abstractmethod
def calculate(
    self,
    period: DateTimeRange,
    save_into: Literal["all", "performance_db"] | None = None,
    cached_data: DataFrame | None = None,
    **kwargs,
) -> Series | DataFrame:
    """
    Abstract method that should be implemented in child classes. Should calculate the feature for the given object and period.

    This method should call the method `_get_required_data` to get the required data before calculating the feature.

    The method also should call the method "save" to save the calculated feature in performance_db.

    At the end of the method, the attribute "_result" should be filled with the result of the calculation.

    Parameters
    ----------
    period : DateTimeRange
        Period for which the feature will be calculated.
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    cached_data : DataFrame | None, optional
        DataFrame with the cached data for each object. This needs to be passed down to the `_get_required_data` method to do chained calculations without needing to save the data in performance_db and get them again. By default None.
    **kwargs : dict, optional
        Additional arguments that will be passed to the "save" method.

    Returns
    -------
    Series | DataFrame
        Pandas Series with the calculated feature.

        If multiple features are calculated in the method, a DataFrame should be returned with two column levels (object_name, feature_name).
    """
    raise NotImplementedError("The method 'calculate' must be implemented in the child class.")

save(save_into=None, **kwargs)

Method to save the calculated feature values in performance_db.

Parameters:

  • save_into

    (Literal['all', 'performance_db'] | None, default: None ) –

    Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

    By default None.

  • **kwargs

    (dict, default: {} ) –

    Not being used at the moment. Here only for compatibility.

Source code in echo_energycalc/feature_calc_core.py
def save(
    self,
    save_into: Literal["all", "performance_db"] | None = None,
    **kwargs,  # noqa: ARG002
) -> None:
    """
    Method to save the calculated feature values in performance_db.

    Parameters
    ----------
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    **kwargs : dict, optional
        Not being used at the moment. Here only for compatibility.
    """
    # checking arguments
    if not isinstance(save_into, str | type(None)):
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
    if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
        raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")

    # checking if calculation was done
    if self.result is None:
        raise ValueError(
            "The calculation was not done. Cannot save the feature calculation results. Please make sure to do something like 'self._result = df[self.feature].copy()' in the method 'calculate' before calling 'self.save()'.",
        )

    if save_into is None:
        return

    if isinstance(save_into, str):
        if save_into not in ["performance_db", "all"]:
            raise ValueError(f"save_into must be 'performance_db' or 'all', not {save_into}.")
        upload_to_bazefield = save_into == "all"
    elif save_into is None:
        upload_to_bazefield = False
    else:
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}.")

    # converting result series to DataFrame if needed
    if isinstance(self.result, Series):
        result_df = self.result.to_frame()
    elif isinstance(self.result, DataFrame):
        result_df = self.result.droplevel(0, axis=1)
    else:
        raise TypeError(f"result must be a pandas Series or DataFrame, not {type(self.result)}.")

    # adjusting DataFrame to be inserted in the database
    # making the columns a Multindex with levels object_name and feature_name
    result_df.columns = MultiIndex.from_product([[self.object], result_df.columns], names=["object_name", "feature_name"])

    self._perfdb.features.values.series.insert(
        df=result_df,
        on_conflict="update",
        bazefield_upload=upload_to_bazefield,
    )