Feature Calculator¶
Overview¶
The FeatureCalculator class is an abstract base class that defines the interface for all feature calculators. It is designed to calculate a feature for a given object.
Usage¶
The feature calculators will be used throughout all feature calculations with some variations, but as a general rule, this is what is done:
- Instantiate the calculator with the necessary arguments and requirements, including the object for which the feature will be calculated. This will validate if the object exists, if the feature exists for the object, and if the name of the feature calculator matches the
server_calc_typefeature attribute in the database. - Calculate the results using the
calculatemethod.- Get and validate all the necessary requirements with the
_get_required_datamethod. - Create a
SeriesorDataFramewith the results of the calculation using the_create_empty_resultmethod. - Run the calculation logic.
- Save the results in the database with the
savemethod, if applicable.
- Get and validate all the necessary requirements with the
Subclass implementation¶
We will use the FeatureCalcExample class as an example of how to implement a subclass of FeatureCalculator.
In general terms Subclasses of FeatureCalculator must have the following requirements:
_nameclass attribute: The name of the feature calculator. This should match theserver_calc_typeattribute of the feature in the database.-
__init__: The constructor of the class. It should override the constructor of the superclass but keeping the exact same arguments. The constructor will be used to define the requirements of the calculation and validate them. In the example below, we are adding a requirement for the object attributenominal_powerusing the_add_requirementmethod and then validating this requirement and getting it's value with the_get_required_datamethod.def __init__( self, object_name: str, feature: str, ) -> None: """ Example of a FeatureCalculator class. Currently this feature calculator is only set up for feature "example_calc" of "wind_turbine" of model "GE100-1.X". For this calculation to work, the following object attributes must be defined: - "nominal_power" (in kW) Parameters ---------- object_name : str Name of the object to calculate the feature for. feature : str Name of the feature to calculate. """ # initialize parent class # this will set the object name and feature using the constructor of the parent class super().__init__(object_name, feature) # defining what are the requirements # this is just a simple example of how to use the RequiredObjectAttributes class, for other requirement options see calculation_requirements.py # in this we are just creating the requirement, no checking if it is fulfilled is done here, it will only be done when calling the _get_required_data method self._add_requirement(RequiredObjectAttributes({self.object: ["nominal_power"]})) # getting the requirement values # in here we are checking if the requirements are fulfilled and getting their values # as in this example we are just using the RequiredObjectAttributes class we don`t need to define a period, but if we were using other requirement classes (like RequiredFeatures) we would need to define a period self._get_required_data()Note
If the calculation depends on data from other features, you should just add the
RequiredFeaturesrequirement to the calculator at the end of the__init__method and then call the_get_required_datamethod in thecalculatemethod. This is done because theRequiredFeaturesrequirement needs is specific for a predefined period, and the period is only defined when calling thecalculatemethod. -
calculate: This method should define and run the calculation logic. In general terms it should do the following:-
In case there are requirements like
RequiredFeaturesthat depend on a period, call the_get_required_datamethod to get the data.Note
Requirements can be added on the fly if needed. This is useful if your calculation initially depends on some data but if it is not available you get another data to calculate the feature.
-
Create a
SeriesorDataFrameobject to store the results of the calculation using the_create_empty_resultmethod. - Do all the calculations needed.
- Save the results to the
_resultattribute for later access. - Save the results in the database with the
savemethod by callingself.save(save_into=save_into, **kwargs).
The following is an example of the
calculatemethod for theFeatureCalcExampleclass, which is very simple as it just calculates a feature with the same value (nominal power) for all timestamps:def calculate( self, period: DateTimeRange, save_into: Literal["all", "performance_db"] | None = None, cached_data: DataFrame | None = None, # noqa: ARG002 **kwargs, ) -> Series: """ Method that will calculate the feature. Parameters ---------- period : DateTimeRange Period for which the feature will be calculated. save_into : Literal["all", "performance_db"] | None, optional Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved. By default None. cached_data : DataFrame | None, optional DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient. By default None **kwargs : dict, optional Additional arguments that will be passed to the "save" method. Returns ------- Series Series with the calculated feature. """ # creating Series to store results # this is not a necessary step, but is useful for already creating a Series with the correct index and name expected by the _save method result = self._create_empty_result(period=period, result_type="Series", freq="10min") # getting nominal power # in here we are getting the nominal power that was stored in the object attributes when calling the _get_required_data method # the _get_requirement_data method will return a dict or a DataFrame depending on the type of requirement nominal_power = self._get_requirement_data("RequiredObjectAttributes")[self.object]["nominal_power"] # here we are just filling the result Series with the nominal power # this example calculation just creates a feature with the same value for all timestamps result.loc[:] = nominal_power # adding calculated feature to class result attribute # this is necessary so that the _save method can access the calculated feature # if not done the code will not save anything in performance_db, bazefield, etc. self._result = result.copy() # saving results # here we are just calling the _save method which gets the series saved in the _result attribute and saves it in performance_db and bazefield # this is done like this because we can opt for not saving the results right now (using save_into=None, which is the default), so the .save method is called later on in the calculation_handler.py self.save(save_into=save_into, **kwargs) return result -
Database Requirements¶
All FeatureCalculator subclasses must have a name that matches the server_calc_type attribute of the feature in the database. So for example, if we have a feature in the database with the server_calc_type attribute set to example_calc, the FeatureCalculator subclass used to calculate this feature will be the FeatureCalcExample class because it has the _name attribute set to example_calc. In other words, the server_calc_type attribute in the database defines which FeatureCalculator subclass will be used to calculate the feature.
Also, the mapping from the server_calc_type attribute to the FeatureCalculator subclass is done in the constants.py file in the echo_energycalc package. Please add any new FeatureCalculator subclass to this mapping.
Class Definition¶
FeatureCalculator(object_name, feature)
¶
Base abstract class for feature calculators. Already defines the methods and attributes that are common to all feature calculators.
This should be overloaded in child classes of FeatureCalculator to define the name, requirements and logic of the feature calculator.
It's extremely important to define the name of the feature calculator in the child class. This name must be equal to the "server_calc_type" attribute of the feature in performance_db and will be used to check if the feature calculator is the correct one for the feature. Set the name of the feature calculator in the child class as a class attribute "_name".
Below there is a simple example on how to define a child class of FeatureCalculator:
class FeatureCalculatorExample(FeatureCalculator):
# name of the feature calculator
_name = "name"
def __init__(self, object_name: str, feature: str) -> None:
# initialize parent class
super().__init__(object_name, feature)
# requirements for the feature calculator
self.add_requirement(...)
Parameters:
-
(object_name¶str) –Name of the object for which the feature is calculated. It must exist in performance_db.
-
(feature¶str) –Feature of the object that is calculated. It must exist in performance_db.
Source code in echo_energycalc/feature_calc_core.py
def __init__(
self,
object_name: str,
feature: str,
) -> None:
"""
Constructor of the FeatureCalculator class.
This should be overloaded in child classes of FeatureCalculator to define the name, requirements and logic of the feature calculator.
It's extremely important to define the name of the feature calculator in the child class. This name must be equal to the "server_calc_type" attribute of the feature in performance_db and will be used to check if the feature calculator is the correct one for the feature. Set the name of the feature calculator in the child class as a class attribute "_name".
Below there is a simple example on how to define a child class of FeatureCalculator:
```python
class FeatureCalculatorExample(FeatureCalculator):
# name of the feature calculator
_name = "name"
def __init__(self, object_name: str, feature: str) -> None:
# initialize parent class
super().__init__(object_name, feature)
# requirements for the feature calculator
self.add_requirement(...)
```
Parameters
----------
object_name : str
Name of the object for which the feature is calculated. It must exist in performance_db.
feature : str
Feature of the object that is calculated. It must exist in performance_db.
"""
# checking arguments
if not isinstance(object_name, str):
raise TypeError(f"object_name must be a string, not {type(object_name)}")
if not isinstance(feature, str):
raise TypeError(f"feature must be a string, not {type(feature)}")
# check if self._name is defined in child class
if not hasattr(self, "_name"):
raise ValueError(f"FeatureCalculator name is not defined in {self.__class__.__name__}.")
# empty set of requirements
self._requirements = None
# creating structure that will be used to connect to performance_db
self._perfdb = PerfDB(application_name=self.__class__.__name__)
# check if object exists in performance_db
obj_def = self._perfdb.objects.instances.get(object_names=[object_name], output_type="dict")
if len(obj_def) == 0:
raise ValueError(f"Object {object_name} does not exist in performance_db.")
self._object = object_name
obj_model = obj_def[object_name]["object_model_name"]
# check if feature exists in performance_db
obj_features = self._perfdb.features.definitions.get(
object_names=[object_name],
feature_names=[feature],
get_attributes=True,
attribute_names=["server_calc_type"],
output_type="dict",
)
if len(obj_features) != 1 or len(obj_features[obj_model]) != 1:
raise ValueError(f"Feature {feature} does not exist in performance_db for object {object_name}.")
self._feature = feature
feature_def = obj_features[obj_model][feature]
# checking if this calculation name is equal to the "server_calc_type" attribute of the feature
if "server_calc_type" not in feature_def:
raise ValueError(
f"Feature {feature} does not have the attribute 'server_calc_type' in performance_db for object {object_name}.",
)
if feature_def["server_calc_type"] != self.name:
raise ValueError(
f"Feature '{feature}' of object '{object_name}' has the attribute 'server_calc_type' equal to '{feature_def['server_calc_type']}' in performance_db, which is different from the name of the class {self.__class__.__name__}('{self.name}').",
)
# results of the calculation. It will be filled by the method "calculate".
self._result: Series | DataFrame | None = None
feature
property
¶
Feature that is calculated. This will be defined in the constructor and cannot be changed.
Returns:
-
str–Name of the feature that is calculated.
name
property
¶
Name of the feature calculator. Is defined in child classes of FeatureCalculator.
This must be equal to the "server_calc_type" attribute of the feature in performance_db.
Returns:
-
str–Name of the feature calculator.
object
property
¶
Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.
Returns:
-
str–Object name for which the feature is calculated.
requirements
property
¶
List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.
Returns:
-
dict[str, list[CalculationRequirement]]–Dict of requirements.
The keys are the names of the classes of the requirements and the values are lists of requirements of that class.
For example:
{"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}
result
property
¶
Result of the calculation. This is None until the method "calculate" is called.
Returns:
-
Series | DataFrame | None:–Result of the calculation if the method "calculate" was called. None otherwise.
calculate(period, save_into=None, cached_data=None, **kwargs)
abstractmethod
¶
Abstract method that should be implemented in child classes. Should calculate the feature for the given object and period.
This method should call the method _get_required_data to get the required data before calculating the feature.
The method also should call the method "save" to save the calculated feature in performance_db.
At the end of the method, the attribute "_result" should be filled with the result of the calculation.
Parameters:
-
(period¶DateTimeRange) –Period for which the feature will be calculated.
-
(save_into¶Literal['all', 'performance_db'] | None, default:None) –Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.
By default None.
-
(cached_data¶DataFrame | None, default:None) –DataFrame with the cached data for each object. This needs to be passed down to the
_get_required_datamethod to do chained calculations without needing to save the data in performance_db and get them again. By default None. -
(**kwargs¶dict, default:{}) –Additional arguments that will be passed to the "save" method.
Returns:
-
Series | DataFrame–Pandas Series with the calculated feature.
If multiple features are calculated in the method, a DataFrame should be returned with two column levels (object_name, feature_name).
Source code in echo_energycalc/feature_calc_core.py
@abstractmethod
def calculate(
self,
period: DateTimeRange,
save_into: Literal["all", "performance_db"] | None = None,
cached_data: DataFrame | None = None,
**kwargs,
) -> Series | DataFrame:
"""
Abstract method that should be implemented in child classes. Should calculate the feature for the given object and period.
This method should call the method `_get_required_data` to get the required data before calculating the feature.
The method also should call the method "save" to save the calculated feature in performance_db.
At the end of the method, the attribute "_result" should be filled with the result of the calculation.
Parameters
----------
period : DateTimeRange
Period for which the feature will be calculated.
save_into : Literal["all", "performance_db"] | None, optional
Argument that will be passed to the method "save". The options are:
- "all": The feature will be saved in performance_db and bazefield.
- "performance_db": the feature will be saved only in performance_db.
- None: The feature will not be saved.
By default None.
cached_data : DataFrame | None, optional
DataFrame with the cached data for each object. This needs to be passed down to the `_get_required_data` method to do chained calculations without needing to save the data in performance_db and get them again. By default None.
**kwargs : dict, optional
Additional arguments that will be passed to the "save" method.
Returns
-------
Series | DataFrame
Pandas Series with the calculated feature.
If multiple features are calculated in the method, a DataFrame should be returned with two column levels (object_name, feature_name).
"""
raise NotImplementedError("The method 'calculate' must be implemented in the child class.")
save(save_into=None, **kwargs)
¶
Method to save the calculated feature values in performance_db.
Parameters:
-
(save_into¶Literal['all', 'performance_db'] | None, default:None) –Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.
By default None.
-
(**kwargs¶dict, default:{}) –Not being used at the moment. Here only for compatibility.
Source code in echo_energycalc/feature_calc_core.py
def save(
self,
save_into: Literal["all", "performance_db"] | None = None,
**kwargs, # noqa: ARG002
) -> None:
"""
Method to save the calculated feature values in performance_db.
Parameters
----------
save_into : Literal["all", "performance_db"] | None, optional
Argument that will be passed to the method "save". The options are:
- "all": The feature will be saved in performance_db and bazefield.
- "performance_db": the feature will be saved only in performance_db.
- None: The feature will not be saved.
By default None.
**kwargs : dict, optional
Not being used at the moment. Here only for compatibility.
"""
# checking arguments
if not isinstance(save_into, str | type(None)):
raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")
# checking if calculation was done
if self.result is None:
raise ValueError(
"The calculation was not done. Cannot save the feature calculation results. Please make sure to do something like 'self._result = df[self.feature].copy()' in the method 'calculate' before calling 'self.save()'.",
)
if save_into is None:
return
if isinstance(save_into, str):
if save_into not in ["performance_db", "all"]:
raise ValueError(f"save_into must be 'performance_db' or 'all', not {save_into}.")
upload_to_bazefield = save_into == "all"
elif save_into is None:
upload_to_bazefield = False
else:
raise TypeError(f"save_into must be a string or None, not {type(save_into)}.")
# converting result series to DataFrame if needed
if isinstance(self.result, Series):
result_df = self.result.to_frame()
elif isinstance(self.result, DataFrame):
result_df = self.result.droplevel(0, axis=1)
else:
raise TypeError(f"result must be a pandas Series or DataFrame, not {type(self.result)}.")
# adjusting DataFrame to be inserted in the database
# making the columns a Multindex with levels object_name and feature_name
result_df.columns = MultiIndex.from_product([[self.object], result_df.columns], names=["object_name", "feature_name"])
self._perfdb.features.values.series.insert(
df=result_df,
on_conflict="update",
bazefield_upload=upload_to_bazefield,
)