Solar String Loss¶

Overview¶

The SolarEnergyLossStrings class is a subclass of SolarEnergyLossCalculator and FeatureCalculator that calculates energy losses in solar installations due to three main factors:

Open strings: Strings that are completely down (below minimum power threshold)
Underperforming strings: Strings producing below 80% of the best performing string during peak hours
Shaded strings: Strings producing below 80% of the best performing string during non-peak hours

Calculation Logic¶

The calculation works as follows:

1. Data Preparation and Reference Power Calculation¶

Get power data from all inverter strings (DC Input channels)
Calculate two reference values:
IQR Filtered Mean: Mean power excluding outliers (for open strings loss calculation)
Best Performing String: Maximum power among all strings (for underperformance/shading loss)

2. Open Strings Loss Calculation¶

Identify strings below minimum power threshold (0.05 kW) as "down"
Count down strings above normal baseline (4 strings can be down normally)
Calculate loss as: (down_strings_above_normal) × filtered_mean_power
Apply daily aggregation logic for strings down all day during curtailment

3. Underperformance and Shading Loss Calculation¶

Identify strings producing less than 80% of the best performing string
Exclude already counted down strings from this calculation
Calculate individual losses as the difference between best string and underperforming strings
Distinguish between underperformance and shading based on solar position:
Underperforming: Losses occurring within ±3 hours of solar noon
Shading: Losses occurring outside the ±3 hours window around solar noon

4. Special Conditions and Filtering¶

Curtailment: Set only open string losses to 0 kW (except daily down strings for open loss)
Communication failure: Set all losses to 0 kW
Inverter stopped: Set all losses to 0 kW
Misaligned trackers: Set underperformance/shading losses to 0 kW
Night time: Set all losses to 0 kW when sun elevation < 0°

5. DC to AC Conversion¶

Convert all DC losses to AC losses using the formula: AC_Loss = (DC_Loss / (Measured_DC + DC_Loss)) × AC_Power

Outputs¶

The calculation produces three separate loss values:

LostActivePowerOpenStrings_5min.AVG: Energy loss due to completely down strings [kW]
LostActivePowerUnderperfStrings_5min.AVG: Energy loss due to underperforming strings during peak hours [kW]
LostActivePowerShading_5min.AVG: Energy loss due to shaded strings during non-peak hours [kW]

Database Requirements¶

Feature attribute server_calc_type must be set to 'solar_energy_loss_strings'.
Object must have latitude and longitude attributes for solar position calculations.

Class Definition¶

`SolarEnergyLossStrings(object_name, feature)` ¶

Base class for solar energy loss due to open strings.

Parameters:

object_name ¶
(str) –

Name of the object for which the feature is calculated. It must exist in performance_db.
feature ¶
(str) –

Feature of the object that is calculated. It must exist in performance_db.

Source code in echo_energycalc/solar_energy_loss_strings.py

def __init__(self, object_name: str, feature: str) -> None:
    """Constructor for SolarEnergyLossStrings.

    Parameters
    ----------
    object_name : str
        Name of the object for which the feature is calculated. It must exist in performance_db.
    feature : str
        Feature of the object that is calculated. It must exist in performance_db.
    """
    # initialize parent class
    super().__init__(object_name, feature)

    # Defining which object attributes are required for the calculation.
    self._add_requirement(
        RequiredObjectAttributes(
            {
                self.object: [
                    "latitude",
                    "longitude",
                ],
            },
        ),
    )
    self._get_required_data()

    # Defining the features that will be required for the calculation. All DC Power inputs and the curtailment state.
    features = [f"DcInput{str(i).zfill(2)}Power_5min.AVG" for i in range(1, 29)]
    features.append("CurtailmentState_5min.REP")
    features.append("CommunicationState_5min.REP")
    features.append("ActivePower_5min.AVG")
    features.append("MisalignedTrackers_5min.REP")
    features.append("IEC-OperationState_5min.REP")
    features.append("ActivePowerSetPointPercent_5min.AVG")

    # Adding suffix _b# to features -> necessary to acquire data from bazefield
    features = {self.object: [f"{feat}_b#" for feat in features]}
    self._add_requirement(RequiredFeatures(features=features))

    # Setting variables for the calculation
    self._underperform_ratio = 0.8  # Ratio below which a string is considered underperforming compared to the best string
    self._minimum_power_threshold = 0.05  # kW
    self._valid_down_strings = 4  # Number of strings that can be down without being considered a loss
    self._minimum_valid_strings = 5  # Minimum number of strings required to perform the IQR calculation

`feature` `property` ¶

Feature that is calculated. This will be defined in the constructor and cannot be changed.

Returns:

str –

Name of the feature that is calculated.

`name` `property` ¶

Name of the feature calculator. Is defined in child classes of FeatureCalculator.

This must be equal to the "server_calc_type" attribute of the feature in performance_db.

Returns:

str –

Name of the feature calculator.

`object` `property` ¶

Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.

Returns:

str –

Object name for which the feature is calculated.

`requirements` `property` ¶

List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.

Returns:

dict[str, list[CalculationRequirement]] –

Dict of requirements.

The keys are the names of the classes of the requirements and the values are lists of requirements of that class.

For example: {"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}

`result` `property` ¶

Result of the calculation. This is None until the method "calculate" is called.

Returns:

Series | DataFrame | None: –

Result of the calculation if the method "calculate" was called. None otherwise.

`calculate(period, save_into=None, cached_data=None, **kwargs)` ¶

Method that will calculate the loss due to DC Power Loss, including down strings, shading and underperforming strings.

The calculation is done following those steps: 1. Get Power from inverter strings 2. Set two references: the best performing string and the mean of all strings, excluding outliers using IQR (Interquartile Range). First will be used to identify underperforming strings, the second will be used to calculate the energy loss from down strings. 3. Count, for each timestamp, how many strings are down, that is, the number of power values that are below the minimum power threshold (currently 0.05 kW) 4. Calculate the energy loss based on the number of shutdown strings * mean power of the inverter string. It is important to notice that the number of shutdown strings only consider values above 4 strings, since it is normal for a inverter to have 4 down strings 5. For underperforming and shaded strings, we disconsider timestamps where there are misaligned trackers, since it can cause strings to be underperforming, so the loss would not be due to underperformance or shading. 6. For every string that is below 80% of the best performing string, we consider it underperforming. The loss due to underperformance is defined by the difference between the best performing string and the underperforming string, followed by discounting all down strings (since they are already accounted for in the previous step). 7. According to the hour, we distinguish a underperforming string from a shaded string. If the hour is between 9 and 15, we consider it underperforming, otherwise we consider it shaded. 8. Calculate the DC losses for down strings, underperforming strings and shaded strings, as a percentage of the total DC power, and convert it to AC losses based on the AC power of the inverter. 9. For open strings only, during a curtailment, the energy loss is attributed to 0 kW. Unless there are strings that were down for the whole day, in that case the energy loss is calculated based on the number of strings that were down for the whole day * mean power of the inverter string. 10. During a communication failure, all the energy loss is attributed to 0 kW.

Parameters:

period ¶
(DateTimeRange) –

Period for which the feature will be calculated.
save_into ¶
(Literal['all', 'performance_db'] | None, default: None ) –

Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

By default None.
cached_data ¶
(DataFrame | None, default: None ) –

DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient. By default None
**kwargs ¶
(dict, default: {} ) –

Additional arguments that will be passed to the "save" method.

Returns:

DataFrame –

DataFrame with the calculated energy losses due to open strings, shading and underperforming strings. The DataFrame will have the following columns: - open_strings_loss: Energy loss due to open strings [kW] - shading_loss: Energy loss due to shading [kW] - underperforming_loss: Energy loss due to underperforming strings [kW]

Source code in echo_energycalc/solar_energy_loss_strings.py

def calculate(
    self,
    period: DateTimeRange,
    save_into: Literal["all", "performance_db"] | None = None,
    cached_data: DataFrame | None = None,
    **kwargs,
) -> DataFrame:
    """
    Method that will calculate the loss due to DC Power Loss, including down strings, shading and underperforming strings.

    The calculation is done following those steps:
    1. Get Power from inverter strings
    2. Set two references: the best performing string and the mean of all strings, excluding outliers using IQR (Interquartile Range).
        First will be used to identify underperforming strings, the second will be used to calculate the energy loss from down strings.
    3. Count, for each timestamp, how many strings are down, that is, the number of power values that are below the minimum power threshold (currently 0.05 kW)
    4. Calculate the energy loss based on the number of shutdown strings * mean power of the inverter string.
        It is important to notice that the number of shutdown strings only consider values above 4 strings, since it is normal for a inverter to have 4 down strings
    5. For underperforming and shaded strings, we disconsider timestamps where there are misaligned trackers, since it can cause strings to be underperforming, so the loss
        would not be due to underperformance or shading.
    6. For every string that is below 80% of the best performing string, we consider it underperforming. The loss due to underperformance is defined by the difference
        between the best performing string and the underperforming string, followed by discounting all down strings (since they are already accounted for in the previous step).
    7. According to the hour, we distinguish a underperforming string from a shaded string. If the hour is between 9 and 15, we consider it underperforming,
        otherwise we consider it shaded.
    8. Calculate the DC losses for down strings, underperforming strings and shaded strings, as a percentage of the total DC power, and convert it to AC losses based on
        the AC power of the inverter.
    9. For open strings only, during a curtailment, the energy loss is attributed to 0 kW. Unless there are strings that were down for the whole day, in that case the energy loss is calculated
        based on the number of strings that were down for the whole day * mean power of the inverter string.
    10. During a communication failure, all the energy loss is attributed to 0 kW.

    Parameters
    ----------
    period : DateTimeRange
        Period for which the feature will be calculated.
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    cached_data : DataFrame | None, optional
        DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient.
        By default None
    **kwargs : dict, optional
        Additional arguments that will be passed to the "save" method.

    Returns
    -------
    DataFrame
        DataFrame with the calculated energy losses due to open strings, shading and underperforming strings.
        The DataFrame will have the following columns:
        - open_strings_loss: Energy loss due to open strings [kW]
        - shading_loss: Energy loss due to shading [kW]
        - underperforming_loss: Energy loss due to underperforming strings [kW]
    """
    t0 = perf_counter()
    # adjusting period to always start at 00:00 and end at 23:55
    adjusted_period = period.copy()
    adjusted_period.start = adjusted_period.start.replace(hour=0, minute=0, second=0, microsecond=0)
    adjusted_period.end = adjusted_period.end.replace(hour=23, minute=59, second=0, microsecond=0)

    # getting feature values
    self._get_required_data(
        period=adjusted_period,
        reindex=None,
        round_timestamps={"freq": timedelta(minutes=5), "tolerance": timedelta(minutes=2)},
        cached_data=cached_data,
    )

    t1 = perf_counter()

    # getting DataFrame with feature values
    df = self._get_requirement_data("RequiredFeatures")
    df = df[self.object]

    # renaming columns to remove the _b# suffix
    df.columns = df.columns.str.removesuffix("_b#")

    # Defining string_cols as all columns with "DcInput"
    string_cols = df.columns[df.columns.str.contains("DcInput")]
    # Fill NaN values with forward and back fill. This needs to be done due to current Bazefield TOTALIZER behavior.
    df[df.columns] = df[df.columns].ffill().bfill()

    # Calculate the mean of each row, excluding outliers using IQR
    data_df = df[string_cols].copy()

    # ? --- CALCULATING THE REFERENCE POWER FOR ALL LOSSES ---

    # IQR Filtered Mean for Down Strings Loss
    data_df_pos = data_df.where(data_df > 0)

    df["valid_counts"] = data_df_pos.count(axis=1)
    data_iqr = data_df_pos.where(df["valid_counts"] >= self._minimum_valid_strings)
    q1 = data_iqr.quantile(0.25, axis=1)
    q3 = data_iqr.quantile(0.75, axis=1)
    mask_inliers = data_iqr.ge(q1, axis=0) & data_iqr.le(q3, axis=0)
    inliers_only = data_iqr.where(mask_inliers) if data_iqr.notna().any().any() else data_df_pos
    df["filtered_mean"] = inliers_only.mean(axis=1).fillna(0)

    # Best Performing String for Underperformance and Shading Losses
    df["best_string_power"] = data_df.max(axis=1).fillna(0)

    # ? --- CALCULATING OPEN STRINGS DC POWER LOSS ---

    down_mask = data_df < self._minimum_power_threshold
    base_count = down_mask.sum(axis=1)
    df["down_strings"] = np.where(df["filtered_mean"] > 1, base_count, 0)

    df["down_strings_above_normal"] = (df["down_strings"] - self._valid_down_strings).clip(lower=0)

    df_bool = df[string_cols] < self._minimum_power_threshold
    daily_all_true = df_bool.groupby(df.index.date).transform("all")
    df["daily_down_strings_above_normal"] = (daily_all_true.sum(axis=1) - self._valid_down_strings).clip(lower=0)

    # curtailment, communication and stopped failure masks
    curtailment_mask = df["CurtailmentState_5min.REP"] == 1
    comm_failure_mask = df["CommunicationState_5min.REP"] != 0
    stopped_mask = df["IEC-OperationState_5min.REP"] < 2

    # Logic for DC loss due to open strings
    dc_loss_open = np.where(
        (df["down_strings_above_normal"] >= df["daily_down_strings_above_normal"]) & (df["down_strings_above_normal"] > 0),
        df["daily_down_strings_above_normal"] * df["filtered_mean"],
        0,
    )
    df["dc_loss_open_strings"] = np.where(
        comm_failure_mask,
        0,
        np.where(
            curtailment_mask,
            dc_loss_open,
            df["down_strings_above_normal"] * df["filtered_mean"],
        ),
    )

    # ? --- CALCULATING UNDERPERFORMING STRINGS AND SHADING DC POWER LOSS ---

    misaligned_trackers_mask = df["MisalignedTrackers_5min.REP"] == 1

    underperformance_threshold = df["best_string_power"] * self._underperform_ratio
    is_underperforming_mask = data_df.lt(underperformance_threshold, axis=0) & ~down_mask  # Exclude down strings

    # Calculate individual loss correctly: only positive differences for underperforming strings
    individual_underperformance_loss = data_df.rsub(df["best_string_power"], axis=0).clip(lower=0)
    total_dc_loss_underperf_and_shading = individual_underperformance_loss.where(is_underperforming_mask, 0).sum(axis=1)

    # Getting timestamps and converting to UTC
    timestamps = df.index

    # adding 3 hours to convert to UTC
    times_pd = timestamps + Timedelta(hours=3)

    # getting the highest solar position to distinguish underperforming from shading losses

    solar_position = pvlib.solarposition.get_solarposition(
        time=times_pd,
        latitude=self._get_requirement_data("RequiredObjectAttributes")[self.object]["latitude"],
        longitude=self._get_requirement_data("RequiredObjectAttributes")[self.object]["longitude"],
    )
    # Identifying the timestamp of solar noon for each day
    daily_solar_noon = solar_position.groupby(solar_position.index.date)["elevation"].idxmax()

    # converting back to local time by removing 3 hours
    daily_solar_noon = daily_solar_noon - Timedelta(hours=3)

    # removing data that are not between 11 and 13
    daily_solar_noon = daily_solar_noon[daily_solar_noon.dt.hour.between(11, 13)]

    # Defining underperformance window as 3 hours before and after solar noon
    t_start_underperf_daily = daily_solar_noon - Timedelta(hours=3)
    t_end_underperf_daily = daily_solar_noon + Timedelta(hours=3)

    # Mapping daily start and end times
    daily_start_times = df.index.normalize().map(t_start_underperf_daily)
    daily_end_times = df.index.normalize().map(t_end_underperf_daily)

    is_underperforming_time = (df.index >= daily_start_times) & (df.index < daily_end_times)

    df["dc_loss_underperforming"] = total_dc_loss_underperf_and_shading.where(is_underperforming_time, 0)
    df["dc_loss_shading"] = total_dc_loss_underperf_and_shading.where(~is_underperforming_time, 0)

    # Zeroing losses during communication, misaligned trackers and inverter stopped
    df.loc[
        comm_failure_mask | misaligned_trackers_mask | stopped_mask,
        ["dc_loss_underperforming", "dc_loss_shading"],
    ] = 0

    # Get the sun's elevation (altitude)
    # Sun altitude < 0 means the sun is below the horizon (night)
    is_night = solar_position["elevation"] < 0

    # Reset index to match df timestamps (convert back from UTC to local time)
    is_night.index = timestamps

    # Zeroing losses when is night time (sun below horizon)
    df.loc[is_night, ["dc_loss_open_strings", "dc_loss_underperforming", "dc_loss_shading"]] = 0

    # ? --- CONVERTING DC LOSSES TO AC LOSSES ---
    total_dc_power = data_df.sum(axis=1)
    ac_power = df["ActivePower_5min.AVG"]
    setpoint_percent = df["ActivePowerSetPointPercent_5min.AVG"]

    open_loss = self._convert_dc_to_ac_loss(df["dc_loss_open_strings"], total_dc_power, ac_power, setpoint_percent)
    underperforming_loss = self._convert_dc_to_ac_loss(df["dc_loss_underperforming"], total_dc_power, ac_power, setpoint_percent)
    shaded_loss = self._convert_dc_to_ac_loss(df["dc_loss_shading"], total_dc_power, ac_power, setpoint_percent)

    t2 = perf_counter()

    # ? --- Saving results into a df to save into db ---

    # Creating a dictionary to hold the data for the DataFrame
    data_dict = {
        "LostActivePowerOpenStrings_5min.AVG": open_loss,
        "LostActivePowerUnderperfStrings_5min.AVG": underperforming_loss,
        "LostActivePowerShading_5min.AVG": shaded_loss,
    }

    # Creating the final DataFrame from the dictionary, columns must be multiindex with object and feature (Object, Feature)
    final_df = DataFrame(data_dict)
    final_df.columns = MultiIndex.from_tuples([(self.object, feature) for feature in final_df.columns], names=["object", "feature"])

    # Assigning the final DataFrame to the self.result attribute
    self._result = final_df

    # Saving the result into the database
    self.save(save_into=save_into, **kwargs)

    logger.debug(
        f"{self.object} - {self.feature} - {period}: Requirements during calc {t1 - t0:.2f}s - Data adjustments {t2 - t1:.2f}s -Saving data {perf_counter() - t2:.2f}s",
    )

    return final_df

`save(save_into=None, **kwargs)` ¶

Method to save the calculated feature values in performance_db.

Parameters:

save_into ¶
(Literal['all', 'performance_db'] | None, default: None ) –

Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

By default None.
**kwargs ¶
(dict, default: {} ) –

Not being used at the moment. Here only for compatibility.

Source code in echo_energycalc/feature_calc_core.py

def save(
    self,
    save_into: Literal["all", "performance_db"] | None = None,
    **kwargs,  # noqa: ARG002
) -> None:
    """
    Method to save the calculated feature values in performance_db.

    Parameters
    ----------
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    **kwargs : dict, optional
        Not being used at the moment. Here only for compatibility.
    """
    # checking arguments
    if not isinstance(save_into, str | type(None)):
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
    if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
        raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")

    # checking if calculation was done
    if self.result is None:
        raise ValueError(
            "The calculation was not done. Cannot save the feature calculation results. Please make sure to do something like 'self._result = df[self.feature].copy()' in the method 'calculate' before calling 'self.save()'.",
        )

    if save_into is None:
        return

    if isinstance(save_into, str):
        if save_into not in ["performance_db", "all"]:
            raise ValueError(f"save_into must be 'performance_db' or 'all', not {save_into}.")
        upload_to_bazefield = save_into == "all"
    elif save_into is None:
        upload_to_bazefield = False
    else:
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}.")

    # converting result series to DataFrame if needed
    if isinstance(self.result, Series):
        result_df = self.result.to_frame()
    elif isinstance(self.result, DataFrame):
        result_df = self.result.droplevel(0, axis=1)
    else:
        raise TypeError(f"result must be a pandas Series or DataFrame, not {type(self.result)}.")

    # adjusting DataFrame to be inserted in the database
    # making the columns a Multindex with levels object_name and feature_name
    result_df.columns = MultiIndex.from_product([[self.object], result_df.columns], names=["object_name", "feature_name"])

    self._perfdb.features.values.series.insert(
        df=result_df,
        on_conflict="update",
        bazefield_upload=upload_to_bazefield,
    )

Solar String Loss¶

Overview¶

Calculation Logic¶

1. Data Preparation and Reference Power Calculation¶

2. Open Strings Loss Calculation¶

3. Underperformance and Shading Loss Calculation¶

4. Special Conditions and Filtering¶

5. DC to AC Conversion¶

Outputs¶

Database Requirements¶

Class Definition¶

`SolarEnergyLossStrings(object_name, feature)` ¶

`object_name` ¶

`feature` ¶

`feature` `property` ¶

`name` `property` ¶

`object` `property` ¶

`requirements` `property` ¶

`result` `property` ¶

`calculate(period, save_into=None, cached_data=None, **kwargs)` ¶

`period` ¶

`save_into` ¶

`cached_data` ¶

`kwargs`** ¶

`save(save_into=None, **kwargs)` ¶

`save_into` ¶

`kwargs`** ¶

Solar String Loss¶

Overview¶

Calculation Logic¶

1. Data Preparation and Reference Power Calculation¶

2. Open Strings Loss Calculation¶

3. Underperformance and Shading Loss Calculation¶

4. Special Conditions and Filtering¶

5. DC to AC Conversion¶

Outputs¶

Database Requirements¶

Class Definition¶

SolarEnergyLossStrings(object_name, feature) ¶

object_name ¶

feature ¶

feature property ¶

name property ¶

object property ¶

requirements property ¶

result property ¶

calculate(period, save_into=None, cached_data=None, **kwargs) ¶

period ¶

save_into ¶

cached_data ¶

**kwargs ¶

save(save_into=None, **kwargs) ¶

save_into ¶

**kwargs ¶

`SolarEnergyLossStrings(object_name, feature)` ¶

`object_name` ¶

`feature` ¶

`feature` `property` ¶

`name` `property` ¶

`object` `property` ¶

`requirements` `property` ¶

`result` `property` ¶

`calculate(period, save_into=None, cached_data=None, **kwargs)` ¶

`period` ¶

`save_into` ¶

`cached_data` ¶

`kwargs`** ¶

`save(save_into=None, **kwargs)` ¶

`save_into` ¶

`kwargs`** ¶