Solar Resource Loss¶
Overview¶
The SolarEnergyLossResource class is a subclass of SolarEnergyLossCalculator and FeatureCalculator that calculates the value of solar power resource loss/gain production using a linear polynomial regression model. Currently this class expects that the model has been trained and saved as a pickle file in calculation_model table.
This class uses a pre-trained linear polynomial regression model from sklearn lybraries. It considers Plane of Array Irradiance and Module Temperature from PVSyst simulations as input and Expected Energy on Point of Connection level as output. PVSyst simulations were used to train this model since the current company targets uses these simulations as base. The model is trained per SPE and the script used to do this training process can be found in the performance server at manual_routines\solar_resource_loss.
Calculation Logic¶
The calculation works as follows:
-
Data Acquisition: Get IrradiancePOA from simple weather stations associated with the SPE and calculate the average values across all stations.
-
Complete Days Filtering: Resample data to hourly frequency and filter only complete days (days with 24 hours of data). Days with missing timestamps are discarded and logged.
-
Night Values Adjustment: Use pvlib to calculate solar position and set irradiance values to 0 during night periods (when sun elevation < 0) if they are NaN.
-
Daily Aggregation: Resample filtered data to daily frequency by summing hourly values.
-
Measured Energy Prediction: Apply the Linear Polynomial Regression Model to the measured daily irradiance to predict daily energy production. Convert to average power (kWmed) by dividing by 24.
-
Target Irradiance Retrieval: Get the target Pxx value (e.g., P50) for the period from performance_db and retrieve the corresponding target irradiance from the resource assessments table.
-
Target Energy Calculation: Apply the same regression model to the target irradiance to get the proportional target energy. This ensures consistency between measured and target calculations.
-
Loss Calculation: Calculate the resource loss/gain as the difference between target average energy and measured average energy:
loss = target_energy - measured_energy. Positive values indicate energy loss (measured irradiance below target), negative values indicate energy gain.
Database Requirements¶
- Feature attribute
server_calc_typemust be set to 'solar_energy_loss_resource'. -
Feature attribute
feature_options_jsonwith the following keys:calc_model_type: Type of the calculation model that will be used to calculate the feature. In the case: 'solar_resource_fit'.model_name: The name os the model to pe considered on the feature calculation. In the case: 'solar_resource_regression'bazefield_features: A boolean indicating if the features comes from bazefield or not. For the solar prediction to work today, all values comes from bazefield database.
Keep in mind that 'calc_model_type' and 'model_name' are only used to find the desired calculation model in the database. See views
v_calculation_modelsandv_calculation_models_files_deffor more details. -
The following object attributes for the object that is being calculated:
- Required:
reference_weather_stations: A dict indicating which simple weather station to be considered during data acquisition. Example: {"simple_ws": "RBG-RBG2-MET1"}
- Required:
Class Definition¶
SolarEnergyLossResource(object_name, feature)
¶
Base class for solar energy loss/gain from Irradiance.
For this class to work, the feature must have the attribute feature_options_json with the following keys:
- 'calc_model_type': type of the model that will be used to calculate the feature. It must match the type of the model in performance_db.
- 'model_name': name of the model that will be used to calculate the feature.
- 'bazefield_features': bool indicating if the required features needs to be acquired from bazefield.
Parameters:
-
(object_name¶str) –Name of the object for which the feature is calculated. It must exist in performance_db.
-
(feature¶str) –Feature of the object that is calculated. It must exist in performance_db.
Source code in echo_energycalc/solar_energy_loss_resource.py
def __init__(self, object_name: str, feature: str) -> None:
"""
Class used to calculate features that depend on a PredictiveModel.
For this class to work, the feature must have the attribute `feature_options_json` with the following keys:
- 'calc_model_type': type of the model that will be used to calculate the feature. It must match the type of the model in performance_db.
- 'model_name': name of the model that will be used to calculate the feature.
- 'bazefield_features': bool indicating if the required features needs to be acquired from bazefield.
Parameters
----------
object_name : str
Name of the object for which the feature is calculated. It must exist in performance_db.
feature : str
Feature of the object that is calculated. It must exist in performance_db.
"""
# initialize parent class
super().__init__(object_name, feature)
self._add_requirement(RequiredFeatureAttributes(self.object, self.feature, ["feature_options_json"]))
self._get_required_data()
self._feature_attributes = self._get_requirement_data("RequiredFeatureAttributes")[self.feature]
self._validate_feature_options()
self._add_requirement(
RequiredCalcModels(
calc_models={
self.object: [
{
"model_name": f".*{self._feature_attributes['feature_options_json']['model_name']}.*",
"model_type": f"^{self._feature_attributes['feature_options_json']['calc_model_type']}$",
},
],
},
),
)
self._add_requirement(
RequiredObjectAttributes(
{
self.object: [
"reference_weather_stations",
"latitude",
"longitude",
],
},
),
)
self._get_required_data()
# getting the model name
self._model_name = next(iter(self._get_requirement_data("RequiredCalcModels")[self.object].keys()))
# loading calculation model from file
self._model = self._get_requirement_data("RequiredCalcModels")[self.object][self._model_name]["model"]
# Deserializing the model from base64
if self._model is None:
raise ValueError(
f"Model {self._model_name} not found for object {self.object}. Please check the configuration in the database.",
)
model_b64_loaded = self._model["model"]
with BytesIO(pybase64.b64decode(model_b64_loaded)) as buffer:
buffer.seek(0)
self._model = joblib.load(buffer)
# defining required features
simple_ws = self._get_requirement_data("RequiredObjectAttributes")[self.object]["reference_weather_stations"]["simple_ws"]
features = {ws: ["IrradiancePOACommOk_5min.AVG"] for ws in simple_ws}
# Adding suffix _b# to features if bazefield_features is True
if self._feature_attributes["feature_options_json"].get("bazefield_features", False):
features = {obj: [f"{feat}_b#" for feat in feats] for obj, feats in features.items()}
self._add_requirement(RequiredFeatures(features=features))
feature
property
¶
Feature that is calculated. This will be defined in the constructor and cannot be changed.
Returns:
-
str–Name of the feature that is calculated.
name
property
¶
Name of the feature calculator. Is defined in child classes of FeatureCalculator.
This must be equal to the "server_calc_type" attribute of the feature in performance_db.
Returns:
-
str–Name of the feature calculator.
object
property
¶
Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.
Returns:
-
str–Object name for which the feature is calculated.
requirements
property
¶
List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.
Returns:
-
dict[str, list[CalculationRequirement]]–Dict of requirements.
The keys are the names of the classes of the requirements and the values are lists of requirements of that class.
For example:
{"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}
result
property
¶
Result of the calculation. This is None until the method "calculate" is called.
Returns:
-
Series | DataFrame | None:–Result of the calculation if the method "calculate" was called. None otherwise.
calculate(period, save_into=None, cached_data=None, **kwargs)
¶
Method that will calculate the feature.
This code will do the following: 1. Get irradiance data from the weather stations associated with the object. 2. Average the irradiance data from all weather stations. 3. Resample the data to daily frequency, keeping only complete days (24 hours of data). 4. Predict the energy production using the model. 5. Get the target energy production from performance_db (P50) as used in current budget. 6. Calculate the energy loss as the difference between the target energy production and the predicted energy production.
Parameters:
-
(period¶DateTimeRange) –Period for which the feature will be calculated.
-
(save_into¶Literal['all', 'performance_db'] | None, default:None) –Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.
By default None.
-
(cached_data¶DataFrame | None, default:None) –DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient. By default None
-
(**kwargs¶dict, default:{}) –Additional arguments that will be passed to the "save" method.
Returns:
-
Series–Pandas Series with the calculated feature.
Source code in echo_energycalc/solar_energy_loss_resource.py
def calculate(
self,
period: DateTimeRange,
save_into: Literal["all", "performance_db"] | None = None,
cached_data: DataFrame | None = None,
**kwargs,
) -> Series:
"""
Method that will calculate the feature.
This code will do the following:
1. Get irradiance data from the weather stations associated with the object.
2. Average the irradiance data from all weather stations.
3. Resample the data to daily frequency, keeping only complete days (24 hours of data).
4. Predict the energy production using the model.
5. Get the target energy production from performance_db (P50) as used in current budget.
6. Calculate the energy loss as the difference between the target energy production and the predicted energy production.
Parameters
----------
period : DateTimeRange
Period for which the feature will be calculated.
save_into : Literal["all", "performance_db"] | None, optional
Argument that will be passed to the method "save". The options are:
- "all": The feature will be saved in performance_db and bazefield.
- "performance_db": the feature will be saved only in performance_db.
- None: The feature will not be saved.
By default None.
cached_data : DataFrame | None, optional
DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient.
By default None
**kwargs : dict, optional
Additional arguments that will be passed to the "save" method.
Returns
-------
Series
Pandas Series with the calculated feature.
"""
t0 = perf_counter()
# adjusting period to account for lagged timestamps
adjusted_period = period.copy()
# creating a series to store the result
result_aux = self._create_empty_result(period=adjusted_period, freq="D", result_type="Series")
# getting feature values
self._get_required_data(
period=adjusted_period,
reindex=None,
round_timestamps={"freq": timedelta(minutes=5), "tolerance": timedelta(minutes=2)},
cached_data=cached_data,
)
t1 = perf_counter()
# getting DataFrame with feature values
df = self._get_requirement_data("RequiredFeatures")
# Averaging the values for the features
df[("AVG", "IrradiancePOACommOk_5min.AVG")] = df.loc[:, (slice(None), "IrradiancePOACommOk_5min.AVG_b#")].mean(axis=1)
# Adjusting Dataframe structure
df = df.loc[:, df.columns.get_level_values("object") == "AVG"]
df.columns = df.columns.droplevel(0)
# Remove the suffix _b# from the columns
df.columns = df.columns.str.replace("_b#$", "", regex=True)
# Renaming columns to match the model input
df = df.rename(
columns={
"IrradiancePOACommOk_5min.AVG": "GlobInc",
},
)
# Resampling the dataframe to hour frequency to filter only complete days from the period
df = df.resample("h").mean()
daily_counts = df.resample("D").size()
complete_days = daily_counts[daily_counts == 24].index
df_complete_days = df[df.index.normalize().isin(complete_days)]
# Adjusting irradiance night values to 0 if NaN
# Getting timestamps and converting to UTC
timestamps = df_complete_days.index
# adding 3 hours to convert to UTC
times_pd = timestamps + Timedelta(hours=3)
solar_position = pvlib.solarposition.get_solarposition(
time=times_pd,
latitude=self._get_requirement_data("RequiredObjectAttributes")[self.object]["latitude"],
longitude=self._get_requirement_data("RequiredObjectAttributes")[self.object]["longitude"],
)
# Get the sun's elevation (altitude)
# Sun altitude < 0 means the sun is below the horizon (night)
is_night = solar_position["elevation"] < 0
# Reset index to match df timestamps (convert back from UTC to local time)
is_night.index = timestamps
df_complete_days.loc[is_night, "GlobInc"] = df_complete_days.loc[is_night, "GlobInc"].fillna(0)
# Resampling the DataFrame to daily frequency only for complete days
df_resampled = df_complete_days.resample("D").sum()
# Logging discarded days due to incomplete data
discarded_days = set(df.index.normalize()) - set(df_resampled.index.normalize())
if discarded_days:
logger.warning(
f"{self.object} - {self.feature} - {period}: Discarded days due to less than 24 hours of data: {', '.join(str(day.date()) for day in discarded_days)}",
)
t2 = perf_counter()
if not df.empty:
# Predicting feature values
daily_energy_model_result = self._model.predict(df_resampled)
daily_energy_model_result_series = Series(daily_energy_model_result, index=df_resampled.index, name="value")
wanted_idx = result_aux.index.intersection(df_resampled.index)
result_aux.loc[wanted_idx] = daily_energy_model_result_series[wanted_idx]
# Converting daily result to kWmed
daily_avg_measured_energy = result_aux / 24
# Getting target values from performance database
# Getting which target_pxx is used on the given period
target_pxx = self._perfdb.kpis.energy.targets.get(
period=adjusted_period,
time_res="daily",
object_or_group_names=[self.object],
measurement_points=["Connection Point"],
values_only=True,
)
# Process each unique combination of target_pxx and target_evaluation_period
# This allows handling periods that span multiple target configurations (e.g., Dec-2025 to Jan-2026)
daily_target_avg_energy_list = []
target_pxx_grouped = target_pxx.groupby(["target_pxx", "target_evaluation_period"])
for (target_pxx_value, target_evaluation_period), group in target_pxx_grouped:
# Get the dates for this target_pxx group
group_dates = group.index.get_level_values("date")
# Query target irradiance for this specific pxx and evaluation period
target_irradiance_df = self._perfdb.resourceassessments.pxx.get(
period=adjusted_period,
time_res="daily",
pxx=[target_pxx_value],
evaluation_periods=[target_evaluation_period],
group_names=[self.object],
resource_types=["solar_irradiance_poa"],
output_type="DataFrame",
)
new_index = target_irradiance_df.index.get_level_values(4)
target_avg_irradiance = Series(target_irradiance_df["value"].values, index=new_index, name="GlobInc") * 24
# Filter to only the dates that belong to this target_pxx group
target_avg_irradiance = target_avg_irradiance[target_avg_irradiance.index.isin(group_dates)]
# Applying the target irradiance to the model to get proportional target energy
if not target_avg_irradiance.empty:
daily_avg_target_energy = self._model.predict(target_avg_irradiance.to_frame())
daily_target_avg_energy_list.append(
Series(daily_avg_target_energy / 24, index=target_avg_irradiance.index, name="target_value"),
)
# Concatenate all target energy series from different pxx groups
daily_target_avg_energy = concat(daily_target_avg_energy_list).sort_index()
result = daily_target_avg_energy - daily_avg_measured_energy
# Triming result index to the adjusted period
result = result.loc[(result.index >= adjusted_period.start) & (result.index < adjusted_period.end)]
result.index = to_datetime(result.index)
t3 = perf_counter()
# adding calculated feature to class result attribute
self._result = result.copy()
# saving results
self.save(save_into=save_into, **kwargs)
logger.debug(
f"{self.object} - {self.feature} - {period}: Requirements during calc {t1 - t0:.2f}s - Data adjustments {t2 - t1:.2f}s - Model prediction {t3 - t2:.2f}s - Saving data {perf_counter() - t3:.2f}s",
)
return result_aux
save(save_into=None, **kwargs)
¶
Method to save the calculated feature values in performance_db.
Parameters:
-
(save_into¶Literal['all', 'performance_db'] | None, default:None) –Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.
By default None.
-
(**kwargs¶dict, default:{}) –Not being used at the moment. Here only for compatibility.
Source code in echo_energycalc/feature_calc_core.py
def save(
self,
save_into: Literal["all", "performance_db"] | None = None,
**kwargs, # noqa: ARG002
) -> None:
"""
Method to save the calculated feature values in performance_db.
Parameters
----------
save_into : Literal["all", "performance_db"] | None, optional
Argument that will be passed to the method "save". The options are:
- "all": The feature will be saved in performance_db and bazefield.
- "performance_db": the feature will be saved only in performance_db.
- None: The feature will not be saved.
By default None.
**kwargs : dict, optional
Not being used at the moment. Here only for compatibility.
"""
# checking arguments
if not isinstance(save_into, str | type(None)):
raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")
# checking if calculation was done
if self.result is None:
raise ValueError(
"The calculation was not done. Cannot save the feature calculation results. Please make sure to do something like 'self._result = df[self.feature].copy()' in the method 'calculate' before calling 'self.save()'.",
)
if save_into is None:
return
if isinstance(save_into, str):
if save_into not in ["performance_db", "all"]:
raise ValueError(f"save_into must be 'performance_db' or 'all', not {save_into}.")
upload_to_bazefield = save_into == "all"
elif save_into is None:
upload_to_bazefield = False
else:
raise TypeError(f"save_into must be a string or None, not {type(save_into)}.")
# converting result series to DataFrame if needed
if isinstance(self.result, Series):
result_df = self.result.to_frame()
elif isinstance(self.result, DataFrame):
result_df = self.result.droplevel(0, axis=1)
else:
raise TypeError(f"result must be a pandas Series or DataFrame, not {type(self.result)}.")
# adjusting DataFrame to be inserted in the database
# making the columns a Multindex with levels object_name and feature_name
result_df.columns = MultiIndex.from_product([[self.object], result_df.columns], names=["object_name", "feature_name"])
self._perfdb.features.values.series.insert(
df=result_df,
on_conflict="update",
bazefield_upload=upload_to_bazefield,
)