Solar String Loss¶
Overview¶
The SolarEnergyLossStrings class is a subclass of SolarEnergyLossCalculator and FeatureCalculator that calculates energy losses in solar installations due to three main factors:
- Open strings: Strings that are completely down (below minimum power threshold)
- Underperforming strings: Strings producing below 80% of the best performing string during peak hours
- Shaded strings: Strings producing below 80% of the best performing string during non-peak hours
Calculation Logic¶
The calculation works as follows:
1. Data Preparation and Reference Power Calculation¶
- Get power data from all inverter strings (DC Input channels)
- Calculate two reference values:
- IQR Filtered Mean: Mean power excluding outliers (for open strings loss calculation)
- Best Performing String: Maximum power among all strings (for underperformance/shading loss)
2. Open Strings Loss Calculation¶
- Identify strings below minimum power threshold (0.05 kW) as "down"
- Count down strings above normal baseline (4 strings can be down normally)
- Calculate loss as:
(down_strings_above_normal) × filtered_mean_power - Apply daily aggregation logic for strings down all day during curtailment
3. Underperformance and Shading Loss Calculation¶
- Identify strings producing less than 80% of the best performing string
- Exclude already counted down strings from this calculation
- Calculate individual losses as the difference between best string and underperforming strings
- Distinguish between underperformance and shading based on solar position:
- Underperforming: Losses occurring within ±3 hours of solar noon
- Shading: Losses occurring outside the ±3 hours window around solar noon
4. Special Conditions and Filtering¶
- Curtailment: Set only open string losses to 0 kW (except daily down strings for open loss)
- Communication failure: Set all losses to 0 kW
- Inverter stopped: Set all losses to 0 kW
- Misaligned trackers: Set underperformance/shading losses to 0 kW
- Night time: Set all losses to 0 kW when sun elevation < 0°
5. DC to AC Conversion¶
- Convert all DC losses to AC losses using the formula:
AC_Loss = (DC_Loss / (Measured_DC + DC_Loss)) × AC_Power
Outputs¶
The calculation produces three separate loss values:
LostActivePowerOpenStrings_5min.AVG: Energy loss due to completely down strings [kW]LostActivePowerUnderperfStrings_5min.AVG: Energy loss due to underperforming strings during peak hours [kW]LostActivePowerShading_5min.AVG: Energy loss due to shaded strings during non-peak hours [kW]
Database Requirements¶
- Feature attribute
server_calc_typemust be set to 'solar_energy_loss_strings'. - Object must have
latitudeandlongitudeattributes for solar position calculations.
Class Definition¶
SolarEnergyLossStrings(object_name, feature)
¶
Base class for solar energy loss due to open strings.
Parameters:
-
(object_name¶str) –Name of the object for which the feature is calculated. It must exist in performance_db.
-
(feature¶str) –Feature of the object that is calculated. It must exist in performance_db.
Source code in echo_energycalc/solar_energy_loss_strings.py
def __init__(self, object_name: str, feature: str) -> None:
"""Constructor for SolarEnergyLossStrings.
Parameters
----------
object_name : str
Name of the object for which the feature is calculated. It must exist in performance_db.
feature : str
Feature of the object that is calculated. It must exist in performance_db.
"""
# initialize parent class
super().__init__(object_name, feature)
# Defining which object attributes are required for the calculation.
self._add_requirement(
RequiredObjectAttributes(
{
self.object: [
"latitude",
"longitude",
],
},
),
)
self._get_required_data()
# Defining the features that will be required for the calculation. All DC Power inputs and the curtailment state.
features = [f"DcInput{str(i).zfill(2)}Power_5min.AVG" for i in range(1, 29)]
features.append("CurtailmentState_5min.REP")
features.append("CommunicationState_5min.REP")
features.append("ActivePower_5min.AVG")
features.append("MisalignedTrackers_5min.REP")
features.append("IEC-OperationState_5min.REP")
features.append("ActivePowerSetPointPercent_5min.AVG")
# Adding suffix _b# to features -> necessary to acquire data from bazefield
features = {self.object: [f"{feat}_b#" for feat in features]}
self._add_requirement(RequiredFeatures(features=features))
# Setting variables for the calculation
self._underperform_ratio = 0.8 # Ratio below which a string is considered underperforming compared to the best string
self._minimum_power_threshold = 0.05 # kW
self._valid_down_strings = 4 # Number of strings that can be down without being considered a loss
self._minimum_valid_strings = 5 # Minimum number of strings required to perform the IQR calculation
feature
property
¶
Feature that is calculated. This will be defined in the constructor and cannot be changed.
Returns:
-
str–Name of the feature that is calculated.
name
property
¶
Name of the feature calculator. Is defined in child classes of FeatureCalculator.
This must be equal to the "server_calc_type" attribute of the feature in performance_db.
Returns:
-
str–Name of the feature calculator.
object
property
¶
Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.
Returns:
-
str–Object name for which the feature is calculated.
requirements
property
¶
List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.
Returns:
-
dict[str, list[CalculationRequirement]]–Dict of requirements.
The keys are the names of the classes of the requirements and the values are lists of requirements of that class.
For example:
{"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}
result
property
¶
Result of the calculation. This is None until the method "calculate" is called.
Returns:
-
Series | DataFrame | None:–Result of the calculation if the method "calculate" was called. None otherwise.
calculate(period, save_into=None, cached_data=None, **kwargs)
¶
Method that will calculate the loss due to DC Power Loss, including down strings, shading and underperforming strings.
The calculation is done following those steps: 1. Get Power from inverter strings 2. Set two references: the best performing string and the mean of all strings, excluding outliers using IQR (Interquartile Range). First will be used to identify underperforming strings, the second will be used to calculate the energy loss from down strings. 3. Count, for each timestamp, how many strings are down, that is, the number of power values that are below the minimum power threshold (currently 0.05 kW) 4. Calculate the energy loss based on the number of shutdown strings * mean power of the inverter string. It is important to notice that the number of shutdown strings only consider values above 4 strings, since it is normal for a inverter to have 4 down strings 5. For underperforming and shaded strings, we disconsider timestamps where there are misaligned trackers, since it can cause strings to be underperforming, so the loss would not be due to underperformance or shading. 6. For every string that is below 80% of the best performing string, we consider it underperforming. The loss due to underperformance is defined by the difference between the best performing string and the underperforming string, followed by discounting all down strings (since they are already accounted for in the previous step). 7. According to the hour, we distinguish a underperforming string from a shaded string. If the hour is between 9 and 15, we consider it underperforming, otherwise we consider it shaded. 8. Calculate the DC losses for down strings, underperforming strings and shaded strings, as a percentage of the total DC power, and convert it to AC losses based on the AC power of the inverter. 9. For open strings only, during a curtailment, the energy loss is attributed to 0 kW. Unless there are strings that were down for the whole day, in that case the energy loss is calculated based on the number of strings that were down for the whole day * mean power of the inverter string. 10. During a communication failure, all the energy loss is attributed to 0 kW.
Parameters:
-
(period¶DateTimeRange) –Period for which the feature will be calculated.
-
(save_into¶Literal['all', 'performance_db'] | None, default:None) –Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.
By default None.
-
(cached_data¶DataFrame | None, default:None) –DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient. By default None
-
(**kwargs¶dict, default:{}) –Additional arguments that will be passed to the "save" method.
Returns:
-
DataFrame–DataFrame with the calculated energy losses due to open strings, shading and underperforming strings. The DataFrame will have the following columns: - open_strings_loss: Energy loss due to open strings [kW] - shading_loss: Energy loss due to shading [kW] - underperforming_loss: Energy loss due to underperforming strings [kW]
Source code in echo_energycalc/solar_energy_loss_strings.py
def calculate(
self,
period: DateTimeRange,
save_into: Literal["all", "performance_db"] | None = None,
cached_data: DataFrame | None = None,
**kwargs,
) -> DataFrame:
"""
Method that will calculate the loss due to DC Power Loss, including down strings, shading and underperforming strings.
The calculation is done following those steps:
1. Get Power from inverter strings
2. Set two references: the best performing string and the mean of all strings, excluding outliers using IQR (Interquartile Range).
First will be used to identify underperforming strings, the second will be used to calculate the energy loss from down strings.
3. Count, for each timestamp, how many strings are down, that is, the number of power values that are below the minimum power threshold (currently 0.05 kW)
4. Calculate the energy loss based on the number of shutdown strings * mean power of the inverter string.
It is important to notice that the number of shutdown strings only consider values above 4 strings, since it is normal for a inverter to have 4 down strings
5. For underperforming and shaded strings, we disconsider timestamps where there are misaligned trackers, since it can cause strings to be underperforming, so the loss
would not be due to underperformance or shading.
6. For every string that is below 80% of the best performing string, we consider it underperforming. The loss due to underperformance is defined by the difference
between the best performing string and the underperforming string, followed by discounting all down strings (since they are already accounted for in the previous step).
7. According to the hour, we distinguish a underperforming string from a shaded string. If the hour is between 9 and 15, we consider it underperforming,
otherwise we consider it shaded.
8. Calculate the DC losses for down strings, underperforming strings and shaded strings, as a percentage of the total DC power, and convert it to AC losses based on
the AC power of the inverter.
9. For open strings only, during a curtailment, the energy loss is attributed to 0 kW. Unless there are strings that were down for the whole day, in that case the energy loss is calculated
based on the number of strings that were down for the whole day * mean power of the inverter string.
10. During a communication failure, all the energy loss is attributed to 0 kW.
Parameters
----------
period : DateTimeRange
Period for which the feature will be calculated.
save_into : Literal["all", "performance_db"] | None, optional
Argument that will be passed to the method "save". The options are:
- "all": The feature will be saved in performance_db and bazefield.
- "performance_db": the feature will be saved only in performance_db.
- None: The feature will not be saved.
By default None.
cached_data : DataFrame | None, optional
DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient.
By default None
**kwargs : dict, optional
Additional arguments that will be passed to the "save" method.
Returns
-------
DataFrame
DataFrame with the calculated energy losses due to open strings, shading and underperforming strings.
The DataFrame will have the following columns:
- open_strings_loss: Energy loss due to open strings [kW]
- shading_loss: Energy loss due to shading [kW]
- underperforming_loss: Energy loss due to underperforming strings [kW]
"""
t0 = perf_counter()
# adjusting period to always start at 00:00 and end at 23:55
adjusted_period = period.copy()
adjusted_period.start = adjusted_period.start.replace(hour=0, minute=0, second=0, microsecond=0)
adjusted_period.end = adjusted_period.end.replace(hour=23, minute=59, second=0, microsecond=0)
# getting feature values
self._get_required_data(
period=adjusted_period,
reindex=None,
round_timestamps={"freq": timedelta(minutes=5), "tolerance": timedelta(minutes=2)},
cached_data=cached_data,
)
t1 = perf_counter()
# getting DataFrame with feature values
df = self._get_requirement_data("RequiredFeatures")
df = df[self.object]
# renaming columns to remove the _b# suffix
df.columns = df.columns.str.removesuffix("_b#")
# Defining string_cols as all columns with "DcInput"
string_cols = df.columns[df.columns.str.contains("DcInput")]
# Fill NaN values with forward and back fill. This needs to be done due to current Bazefield TOTALIZER behavior.
df[df.columns] = df[df.columns].ffill().bfill()
# Calculate the mean of each row, excluding outliers using IQR
data_df = df[string_cols].copy()
# ? --- CALCULATING THE REFERENCE POWER FOR ALL LOSSES ---
# IQR Filtered Mean for Down Strings Loss
data_df_pos = data_df.where(data_df > 0)
df["valid_counts"] = data_df_pos.count(axis=1)
data_iqr = data_df_pos.where(df["valid_counts"] >= self._minimum_valid_strings)
q1 = data_iqr.quantile(0.25, axis=1)
q3 = data_iqr.quantile(0.75, axis=1)
mask_inliers = data_iqr.ge(q1, axis=0) & data_iqr.le(q3, axis=0)
inliers_only = data_iqr.where(mask_inliers) if data_iqr.notna().any().any() else data_df_pos
df["filtered_mean"] = inliers_only.mean(axis=1).fillna(0)
# Best Performing String for Underperformance and Shading Losses
df["best_string_power"] = data_df.max(axis=1).fillna(0)
# ? --- CALCULATING OPEN STRINGS DC POWER LOSS ---
down_mask = data_df < self._minimum_power_threshold
base_count = down_mask.sum(axis=1)
df["down_strings"] = np.where(df["filtered_mean"] > 1, base_count, 0)
df["down_strings_above_normal"] = (df["down_strings"] - self._valid_down_strings).clip(lower=0)
df_bool = df[string_cols] < self._minimum_power_threshold
daily_all_true = df_bool.groupby(df.index.date).transform("all")
df["daily_down_strings_above_normal"] = (daily_all_true.sum(axis=1) - self._valid_down_strings).clip(lower=0)
# curtailment, communication and stopped failure masks
curtailment_mask = df["CurtailmentState_5min.REP"] == 1
comm_failure_mask = df["CommunicationState_5min.REP"] != 0
stopped_mask = df["IEC-OperationState_5min.REP"] < 2
# Logic for DC loss due to open strings
dc_loss_open = np.where(
(df["down_strings_above_normal"] >= df["daily_down_strings_above_normal"]) & (df["down_strings_above_normal"] > 0),
df["daily_down_strings_above_normal"] * df["filtered_mean"],
0,
)
df["dc_loss_open_strings"] = np.where(
comm_failure_mask,
0,
np.where(
curtailment_mask,
dc_loss_open,
df["down_strings_above_normal"] * df["filtered_mean"],
),
)
# ? --- CALCULATING UNDERPERFORMING STRINGS AND SHADING DC POWER LOSS ---
misaligned_trackers_mask = df["MisalignedTrackers_5min.REP"] == 1
underperformance_threshold = df["best_string_power"] * self._underperform_ratio
is_underperforming_mask = data_df.lt(underperformance_threshold, axis=0) & ~down_mask # Exclude down strings
# Calculate individual loss correctly: only positive differences for underperforming strings
individual_underperformance_loss = data_df.rsub(df["best_string_power"], axis=0).clip(lower=0)
total_dc_loss_underperf_and_shading = individual_underperformance_loss.where(is_underperforming_mask, 0).sum(axis=1)
# Getting timestamps and converting to UTC
timestamps = df.index
# adding 3 hours to convert to UTC
times_pd = timestamps + Timedelta(hours=3)
# getting the highest solar position to distinguish underperforming from shading losses
solar_position = pvlib.solarposition.get_solarposition(
time=times_pd,
latitude=self._get_requirement_data("RequiredObjectAttributes")[self.object]["latitude"],
longitude=self._get_requirement_data("RequiredObjectAttributes")[self.object]["longitude"],
)
# Identifying the timestamp of solar noon for each day
daily_solar_noon = solar_position.groupby(solar_position.index.date)["elevation"].idxmax()
# converting back to local time by removing 3 hours
daily_solar_noon = daily_solar_noon - Timedelta(hours=3)
# removing data that are not between 11 and 13
daily_solar_noon = daily_solar_noon[daily_solar_noon.dt.hour.between(11, 13)]
# Defining underperformance window as 3 hours before and after solar noon
t_start_underperf_daily = daily_solar_noon - Timedelta(hours=3)
t_end_underperf_daily = daily_solar_noon + Timedelta(hours=3)
# Mapping daily start and end times
daily_start_times = df.index.normalize().map(t_start_underperf_daily)
daily_end_times = df.index.normalize().map(t_end_underperf_daily)
is_underperforming_time = (df.index >= daily_start_times) & (df.index < daily_end_times)
df["dc_loss_underperforming"] = total_dc_loss_underperf_and_shading.where(is_underperforming_time, 0)
df["dc_loss_shading"] = total_dc_loss_underperf_and_shading.where(~is_underperforming_time, 0)
# Zeroing losses during communication, misaligned trackers and inverter stopped
df.loc[
comm_failure_mask | misaligned_trackers_mask | stopped_mask,
["dc_loss_underperforming", "dc_loss_shading"],
] = 0
# Get the sun's elevation (altitude)
# Sun altitude < 0 means the sun is below the horizon (night)
is_night = solar_position["elevation"] < 0
# Reset index to match df timestamps (convert back from UTC to local time)
is_night.index = timestamps
# Zeroing losses when is night time (sun below horizon)
df.loc[is_night, ["dc_loss_open_strings", "dc_loss_underperforming", "dc_loss_shading"]] = 0
# ? --- CONVERTING DC LOSSES TO AC LOSSES ---
total_dc_power = data_df.sum(axis=1)
ac_power = df["ActivePower_5min.AVG"]
setpoint_percent = df["ActivePowerSetPointPercent_5min.AVG"]
open_loss = self._convert_dc_to_ac_loss(df["dc_loss_open_strings"], total_dc_power, ac_power, setpoint_percent)
underperforming_loss = self._convert_dc_to_ac_loss(df["dc_loss_underperforming"], total_dc_power, ac_power, setpoint_percent)
shaded_loss = self._convert_dc_to_ac_loss(df["dc_loss_shading"], total_dc_power, ac_power, setpoint_percent)
t2 = perf_counter()
# ? --- Saving results into a df to save into db ---
# Creating a dictionary to hold the data for the DataFrame
data_dict = {
"LostActivePowerOpenStrings_5min.AVG": open_loss,
"LostActivePowerUnderperfStrings_5min.AVG": underperforming_loss,
"LostActivePowerShading_5min.AVG": shaded_loss,
}
# Creating the final DataFrame from the dictionary, columns must be multiindex with object and feature (Object, Feature)
final_df = DataFrame(data_dict)
final_df.columns = MultiIndex.from_tuples([(self.object, feature) for feature in final_df.columns], names=["object", "feature"])
# Assigning the final DataFrame to the self.result attribute
self._result = final_df
# Saving the result into the database
self.save(save_into=save_into, **kwargs)
logger.debug(
f"{self.object} - {self.feature} - {period}: Requirements during calc {t1 - t0:.2f}s - Data adjustments {t2 - t1:.2f}s -Saving data {perf_counter() - t2:.2f}s",
)
return final_df
save(save_into=None, **kwargs)
¶
Method to save the calculated feature values in performance_db.
Parameters:
-
(save_into¶Literal['all', 'performance_db'] | None, default:None) –Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.
By default None.
-
(**kwargs¶dict, default:{}) –Not being used at the moment. Here only for compatibility.
Source code in echo_energycalc/feature_calc_core.py
def save(
self,
save_into: Literal["all", "performance_db"] | None = None,
**kwargs, # noqa: ARG002
) -> None:
"""
Method to save the calculated feature values in performance_db.
Parameters
----------
save_into : Literal["all", "performance_db"] | None, optional
Argument that will be passed to the method "save". The options are:
- "all": The feature will be saved in performance_db and bazefield.
- "performance_db": the feature will be saved only in performance_db.
- None: The feature will not be saved.
By default None.
**kwargs : dict, optional
Not being used at the moment. Here only for compatibility.
"""
# checking arguments
if not isinstance(save_into, str | type(None)):
raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")
# checking if calculation was done
if self.result is None:
raise ValueError(
"The calculation was not done. Cannot save the feature calculation results. Please make sure to do something like 'self._result = df[self.feature].copy()' in the method 'calculate' before calling 'self.save()'.",
)
if save_into is None:
return
if isinstance(save_into, str):
if save_into not in ["performance_db", "all"]:
raise ValueError(f"save_into must be 'performance_db' or 'all', not {save_into}.")
upload_to_bazefield = save_into == "all"
elif save_into is None:
upload_to_bazefield = False
else:
raise TypeError(f"save_into must be a string or None, not {type(save_into)}.")
# converting result series to DataFrame if needed
if isinstance(self.result, Series):
result_df = self.result.to_frame()
elif isinstance(self.result, DataFrame):
result_df = self.result.droplevel(0, axis=1)
else:
raise TypeError(f"result must be a pandas Series or DataFrame, not {type(self.result)}.")
# adjusting DataFrame to be inserted in the database
# making the columns a Multindex with levels object_name and feature_name
result_df.columns = MultiIndex.from_product([[self.object], result_df.columns], names=["object_name", "feature_name"])
self._perfdb.features.values.series.insert(
df=result_df,
on_conflict="update",
bazefield_upload=upload_to_bazefield,
)