Skip to content

Solar String Loss

Overview

SolarEnergyLossStrings calculates energy losses from three string-level failure modes in solar inverters, all derived from DC input current measurements:

  • Open strings: Strings that are completely down (power below 0.05 kW threshold).
  • Underperforming strings: Strings producing less than 80% of the best string during the ±3 h window around solar noon.
  • Shaded strings: Strings producing less than 80% of the best string outside that window.

All three losses are calculated at 5-minute resolution.


Calculation Logic

1. Data Preparation

Fetches DC input power for all 28 string channels (DcInput01Power_5min.AVG through DcInput28Power_5min.AVG) and operational state features from Bazefield. Forward-fills then backward-fills all columns to handle Bazefield totalizer gaps.

2. Reference Power Calculation

Two reference values are computed per timestamp:

  • IQR filtered mean (filtered_mean): The mean power of strings whose values fall within the interquartile range [Q1, Q3]. Outlier strings are excluded. Requires at least 5 valid (positive) strings; rows with fewer are treated as null.
  • Best performing string (best_string_power): The maximum power among all 28 strings.

3. Open Strings Loss

A string is "down" when its power is below the minimum threshold (0.05 kW).

Text Only
down_strings = count(strings < 0.05 kW)  [only when filtered_mean > 1 kW]
down_strings_above_normal = max(down_strings - 4, 0)   [4 strings down is normal]

dc_loss_open = down_strings_above_normal x filtered_mean

Daily all-day-down logic: If a string is below threshold for the entire day, it counts as a persistent open string even during curtailment periods. Only strings down for the full day contribute to open string losses during curtailment; timestamp-level open strings are zeroed during curtailment.

Special conditions:

Condition Open string loss
Communication failure 0 kW
Curtailment (partial) 0 kW (except persistent all-day-down strings)
Night (elevation < 0) 0 kW

4. Underperforming and Shading Loss

Applied only to strings that are NOT down. A string underperforms if:

Text Only
string_power < best_string_power x 0.80

Loss per underperforming string:

Text Only
string_dc_loss = best_string_power - string_power

The distinction between underperformance and shading is based on solar position. Solar noon is computed per day using pvlib. The window is defined as:

Text Only
underperforming window: [solar_noon - 3h, solar_noon + 3h]
  • Inside this window → underperforming loss
  • Outside this window → shading loss

Frozen data detection: A string is considered frozen (sensor error) if its value is unchanged for 6 or more consecutive 5-minute periods. If more than 50% of strings are frozen simultaneously, underperformance and shading losses are set to 0.

Special conditions:

Condition Underperforming and shading loss
Communication failure 0 kW
Misaligned trackers 0 kW
Inverter stopped (IEC-OperationState < 2) 0 kW
Night (elevation < 0) 0 kW
Frozen data (> 50% strings unchanged for 6+ periods) 0 kW

5. DC to AC Conversion

All losses are DC losses. They are converted to AC losses using the instantaneous DC-to-AC efficiency ratio:

Text Only
ac_loss = dc_loss x (ac_power / total_dc_power)

The result is capped at the headroom below the active power setpoint:

Text Only
ac_loss = min(ac_loss, (setpoint_percent / 100 x 330 kW) - ac_power)

Outputs

The calculator produces three separate features simultaneously:

Feature Description
LostActivePowerOpenStrings_5min.AVG Loss due to strings completely below threshold (kW)
LostActivePowerUnderperfStrings_5min.AVG Loss due to underperforming strings during solar noon window (kW)
LostActivePowerShading_5min.AVG Loss due to shaded strings outside solar noon window (kW)

Database Requirements

Feature Attribute

Attribute Value
server_calc_type solar_energy_loss_strings

Object Attributes

Attribute Required Description
latitude Yes Geographic latitude (decimal degrees). Used for pvlib solar noon calculation.
longitude Yes Geographic longitude (decimal degrees). Used for pvlib solar noon calculation.

Features (inverter — from Bazefield)

Feature Description
DcInput01Power_5min.AVGDcInput28Power_5min.AVG DC string power per channel (kW)
ActivePower_5min.AVG Total AC active power (kW) — used for DC-to-AC conversion
ActivePowerSetPointPercent_5min.AVG Active power setpoint (%) — used to cap AC losses
CurtailmentState_5min.REP Curtailment flag
CommunicationState_5min.REP Communication failure flag (non-zero = failure)
IEC-OperationState_5min.REP IEC operation state (< 2 = stopped)
MisalignedTrackers_5min.REP Tracker misalignment flag

Module-Level Constants

Constant Value Description
_underperform_ratio 0.80 Threshold below which a string is underperforming relative to best string
_minimum_power_threshold 0.05 kW Power below which a string is considered down
_valid_down_strings 4 Number of down strings considered normal (excluded from loss)
_minimum_valid_strings 5 Minimum active strings required for IQR calculation
_minimum_frozen_periods 6 Consecutive unchanged periods to flag frozen data

Class Definition

SolarEnergyLossStrings(object_name, feature)

Base class for solar energy loss due to open strings.

Parameters:

  • object_name

    (str) –

    Name of the object for which the feature is calculated. It must exist in performance_db.

  • feature

    (str) –

    Feature of the object that is calculated. It must exist in performance_db.

Source code in echo_energycalc/solar_energy_loss_strings.py
Python
def __init__(self, object_name: str, feature: str) -> None:
    """Constructor for SolarEnergyLossStrings.

    Parameters
    ----------
    object_name : str
        Name of the object for which the feature is calculated. It must exist in performance_db.
    feature : str
        Feature of the object that is calculated. It must exist in performance_db.
    """
    # initialize parent class
    super().__init__(object_name, feature)

    # Defining which object attributes are required for the calculation.
    self._add_requirement(
        RequiredObjectAttributes(
            {
                self.object: [
                    "latitude",
                    "longitude",
                ],
            },
        ),
    )
    self._fetch_requirements()

    # Defining the features that will be required for the calculation. All DC Power inputs and the curtailment state.
    features = [f"DcInput{str(i).zfill(2)}Power_5min.AVG" for i in range(1, 29)]
    features.append("CurtailmentState_5min.REP")
    features.append("CommunicationState_5min.REP")
    features.append("ActivePower_5min.AVG")
    features.append("MisalignedTrackers_5min.REP")
    features.append("IEC-OperationState_5min.REP")
    features.append("ActivePowerSetPointPercent_5min.AVG")

    # Adding suffix _b# to features -> necessary to acquire data from bazefield
    features = {self.object: [f"{feat}_b#" for feat in features]}
    self._add_requirement(RequiredFeatures(features=features))

    # Setting variables for the calculation
    self._underperform_ratio = 0.8  # Ratio below which a string is considered underperforming compared to the best string
    self._minimum_power_threshold = 0.05  # kW
    self._valid_down_strings = 4  # Number of strings that can be down without being considered a loss
    self._minimum_valid_strings = 5  # Minimum number of strings required to perform the IQR calculation
    self._minimum_frozen_periods = 6  # Minimum number of consecutive periods to consider data as frozen

feature property

Feature that is calculated. This will be defined in the constructor and cannot be changed.

Returns:

  • str

    Name of the feature that is calculated.

name property

Name of the feature calculator. Is defined in child classes of FeatureCalculator.

This must be equal to the "server_calc_type" attribute of the feature in performance_db.

Returns:

  • str

    Name of the feature calculator.

object property

Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.

Returns:

  • str

    Object name for which the feature is calculated.

requirements property

List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.

Returns:

  • dict[str, list[CalculationRequirement]]

    Dict of requirements.

    The keys are the names of the classes of the requirements and the values are lists of requirements of that class.

    For example: {"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}

result property

Result of the calculation. This is None until the method "calculate" is called.

Returns:

  • DataFrame | None

    Polars DataFrame with a "timestamp" column and one or more feature value columns. None until calculate is called.

calculate(period, save_into=None, cached_data=None, **kwargs)

Method that will calculate the loss due to DC Power Loss, including down strings, shading and underperforming strings.

The calculation is done following those steps: 1. Get Power from inverter strings 2. Set two references: the best performing string and the mean of all strings, excluding outliers using IQR (Interquartile Range). First will be used to identify underperforming strings, the second will be used to calculate the energy loss from down strings. 3. Count, for each timestamp, how many strings are down, that is, the number of power values that are below the minimum power threshold (currently 0.05 kW) 4. Calculate the energy loss based on the number of shutdown strings * mean power of the inverter string. It is important to notice that the number of shutdown strings only consider values above 4 strings, since it is normal for a inverter to have 4 down strings 5. For underperforming and shaded strings, we disconsider timestamps where there are misaligned trackers, since it can cause strings to be underperforming, so the loss would not be due to underperformance or shading. 6. For every string that is below 80% of the best performing string, we consider it underperforming. The loss due to underperformance is defined by the difference between the best performing string and the underperforming string, followed by discounting all down strings (since they are already accounted for in the previous step). 7. According to the hour, we distinguish a underperforming string from a shaded string. If the hour is between 9 and 15, we consider it underperforming, otherwise we consider it shaded. 8. Calculate the DC losses for down strings, underperforming strings and shaded strings, as a percentage of the total DC power, and convert it to AC losses based on the AC power of the inverter. 9. For open strings only, during a curtailment, the energy loss is attributed to 0 kW. Unless there are strings that were down for the whole day, in that case the energy loss is calculated based on the number of strings that were down for the whole day * mean power of the inverter string. 10. During a communication failure, all the energy loss is attributed to 0 kW.

Parameters:

  • period

    (DateTimeRange) –

    Period for which the feature will be calculated.

  • save_into

    (Literal['all', 'performance_db'] | None, default: None ) –

    Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

    By default None.

  • cached_data

    (DataFrame | None, default: None ) –

    DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient. By default None

  • **kwargs

    (dict, default: {} ) –

    Additional arguments that will be passed to the "save" method.

Returns:

  • DataFrame

    DataFrame with the calculated energy losses due to open strings, shading and underperforming strings. The DataFrame will have the following columns: - open_strings_loss: Energy loss due to open strings [kW] - shading_loss: Energy loss due to shading [kW] - underperforming_loss: Energy loss due to underperforming strings [kW]

Source code in echo_energycalc/solar_energy_loss_strings.py
Python
def calculate(
    self,
    period: DateTimeRange,
    save_into: Literal["all", "performance_db"] | None = None,
    cached_data: pl.DataFrame | None = None,
    **kwargs,
) -> pl.DataFrame:
    """
    Method that will calculate the loss due to DC Power Loss, including down strings, shading and underperforming strings.

    The calculation is done following those steps:
    1. Get Power from inverter strings
    2. Set two references: the best performing string and the mean of all strings, excluding outliers using IQR (Interquartile Range).
        First will be used to identify underperforming strings, the second will be used to calculate the energy loss from down strings.
    3. Count, for each timestamp, how many strings are down, that is, the number of power values that are below the minimum power threshold (currently 0.05 kW)
    4. Calculate the energy loss based on the number of shutdown strings * mean power of the inverter string.
        It is important to notice that the number of shutdown strings only consider values above 4 strings, since it is normal for a inverter to have 4 down strings
    5. For underperforming and shaded strings, we disconsider timestamps where there are misaligned trackers, since it can cause strings to be underperforming, so the loss
        would not be due to underperformance or shading.
    6. For every string that is below 80% of the best performing string, we consider it underperforming. The loss due to underperformance is defined by the difference
        between the best performing string and the underperforming string, followed by discounting all down strings (since they are already accounted for in the previous step).
    7. According to the hour, we distinguish a underperforming string from a shaded string. If the hour is between 9 and 15, we consider it underperforming,
        otherwise we consider it shaded.
    8. Calculate the DC losses for down strings, underperforming strings and shaded strings, as a percentage of the total DC power, and convert it to AC losses based on
        the AC power of the inverter.
    9. For open strings only, during a curtailment, the energy loss is attributed to 0 kW. Unless there are strings that were down for the whole day, in that case the energy loss is calculated
        based on the number of strings that were down for the whole day * mean power of the inverter string.
    10. During a communication failure, all the energy loss is attributed to 0 kW.

    Parameters
    ----------
    period : DateTimeRange
        Period for which the feature will be calculated.
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    cached_data : DataFrame | None, optional
        DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient.
        By default None
    **kwargs : dict, optional
        Additional arguments that will be passed to the "save" method.

    Returns
    -------
    DataFrame
        DataFrame with the calculated energy losses due to open strings, shading and underperforming strings.
        The DataFrame will have the following columns:
        - open_strings_loss: Energy loss due to open strings [kW]
        - shading_loss: Energy loss due to shading [kW]
        - underperforming_loss: Energy loss due to underperforming strings [kW]
    """
    t0 = perf_counter()
    # adjusting period to always start at 00:00 and end at 23:55
    adjusted_period = period.copy()
    adjusted_period.start = adjusted_period.start.replace(hour=0, minute=0, second=0, microsecond=0)
    adjusted_period.end = adjusted_period.end.replace(hour=23, minute=59, second=0, microsecond=0)

    # getting feature values
    self._fetch_requirements(
        period=adjusted_period,
        reindex=None,
        round_timestamps={"freq": timedelta(minutes=5), "tolerance": timedelta(minutes=2)},
        cached_data=cached_data,
    )

    t1 = perf_counter()

    # getting polars DataFrame with feature values
    raw_df = self._requirement_data("RequiredFeatures")

    # Build column rename map: "Obj@Feat_b#" -> "Feat"
    rename_map = {}
    for c in raw_df.columns:
        if c == "timestamp":
            continue
        # strip "ObjName@" prefix and "_b#" suffix
        feat_part = c.split("@", 1)[1].removesuffix("_b#")
        rename_map[c] = feat_part

    df = raw_df.rename(rename_map)

    # Defining string_cols as all columns with "DcInput"
    string_cols = [c for c in df.columns if "DcInput" in c]

    # Fill NaN values with forward and back fill (due to Bazefield TOTALIZER behavior)
    all_data_cols = [c for c in df.columns if c != "timestamp"]
    df = df.with_columns([pl.col(c).forward_fill().backward_fill() for c in all_data_cols])

    # ? --- CALCULATING THE REFERENCE POWER FOR ALL LOSSES ---

    # Ensure string_cols are numeric (source may return String/Utf8 when all values are null)
    df = df.with_columns([pl.col(c).cast(pl.Float64, strict=False) for c in string_cols])

    # Positive values only for IQR filtering (strings > 0)
    data_pos_exprs = [pl.when(pl.col(c) > 0).then(pl.col(c)).otherwise(None).alias(c) for c in string_cols]
    df = df.with_columns(data_pos_exprs)

    # Count valid (non-null positive) strings per row
    valid_counts = pl.sum_horizontal([pl.col(c).is_not_null().cast(pl.Int32) for c in string_cols])
    df = df.with_columns(valid_counts.alias("valid_counts"))

    # Mask rows with too few valid strings for IQR (set to null for those rows)
    # For IQR, we need >= minimum_valid_strings
    min_vs = self._minimum_valid_strings
    data_for_iqr_exprs = [
        pl.when(pl.col("valid_counts") >= min_vs).then(pl.col(c)).otherwise(None).alias(f"_iqr_{c}") for c in string_cols
    ]
    df = df.with_columns(data_for_iqr_exprs)

    iqr_cols = [f"_iqr_{c}" for c in string_cols]

    # Compute Q1 and Q3 per row via concat_list + list.eval
    row_list = pl.concat_list([pl.col(c) for c in iqr_cols]).list.drop_nulls()
    q1_series = row_list.list.eval(pl.element().quantile(0.25)).list.first().alias("_q1")
    q3_series = row_list.list.eval(pl.element().quantile(0.75)).list.first().alias("_q3")
    df = df.with_columns([q1_series, q3_series])

    # Compute filtered mean: values within [Q1, Q3] per row
    inlier_exprs = [
        pl.when((pl.col(f"_iqr_{c}") >= pl.col("_q1")) & (pl.col(f"_iqr_{c}") <= pl.col("_q3")))
        .then(pl.col(f"_iqr_{c}"))
        .otherwise(None)
        .alias(f"_inlier_{c}")
        for c in string_cols
    ]
    df = df.with_columns(inlier_exprs)
    inlier_cols = [f"_inlier_{c}" for c in string_cols]

    filtered_mean = pl.mean_horizontal([pl.col(c) for c in inlier_cols]).fill_null(0.0).alias("filtered_mean")
    df = df.with_columns(filtered_mean)

    # Best Performing String
    best_string = pl.max_horizontal([pl.col(c) for c in string_cols]).fill_null(0.0).alias("best_string_power")
    df = df.with_columns(best_string)

    # ? --- CALCULATING OPEN STRINGS DC POWER LOSS ---

    # Down mask: string < minimum_power_threshold
    down_mask_exprs = [(pl.col(c) < self._minimum_power_threshold).alias(f"_down_{c}") for c in string_cols]
    df = df.with_columns(down_mask_exprs)
    down_mask_cols = [f"_down_{c}" for c in string_cols]

    # Base count of down strings (only when filtered_mean > 1)
    base_count = pl.sum_horizontal([pl.col(c).cast(pl.Int32) for c in down_mask_cols])
    df = df.with_columns(
        pl.when(pl.col("filtered_mean") > 1).then(base_count).otherwise(0).alias("down_strings"),
    )

    df = df.with_columns(
        (pl.col("down_strings") - self._valid_down_strings).clip(lower_bound=0).alias("down_strings_above_normal"),
    )

    # Daily "always down" check: for each string, was it below threshold ALL timestamps in the day?
    # We need: for each string col (using original values), group by date, check if all are down
    # We compute this by: for each day, count total periods and count down periods per string
    # Then sum across strings
    # Strategy: add date col, group_by date to get "all down" counts, then join back
    df = df.with_columns(pl.col("timestamp").dt.date().alias("_date"))

    # For each string col, compute bool (True if down = original value < threshold)
    # Note: at this point, string_cols are already positive-nulled. We need the original data check.
    # The down condition here uses the already-modified (positive-nulled) cols, which is consistent
    # with original logic: data_df < minimum_power_threshold where data_df = df[string_cols] (positive only)
    daily_down_per_string = [pl.col(f"_down_{c}").cast(pl.Int32).alias(f"_dd_{c}") for c in string_cols]
    df = df.with_columns(daily_down_per_string)

    # For each string: count total timestamps per day and sum of down per day
    # Then check if count_down == count_total (all-day down)
    daily_agg_exprs = [pl.col(f"_dd_{c}").sum().alias(f"_sum_dd_{c}") for c in string_cols]
    daily_total_expr = pl.len().alias("_total_count")
    daily_df = df.group_by("_date").agg([daily_total_expr, *daily_agg_exprs])

    # For each string: daily_all_true = (sum_down == total_count)
    daily_all_true_exprs = [
        (pl.col(f"_sum_dd_{c}") == pl.col("_total_count")).cast(pl.Int32).alias(f"_allday_{c}") for c in string_cols
    ]
    daily_df = daily_df.with_columns(daily_all_true_exprs)

    # Sum across strings to get number of strings down all day
    daily_df = daily_df.with_columns(
        pl.sum_horizontal([pl.col(f"_allday_{c}") for c in string_cols]).alias("_daily_down_total"),
    )
    daily_df = daily_df.with_columns(
        (pl.col("_daily_down_total") - self._valid_down_strings).clip(lower_bound=0).alias("daily_down_strings_above_normal"),
    )

    # Join daily result back to main df
    df = df.join(daily_df.select(["_date", "daily_down_strings_above_normal"]), on="_date", how="left")

    # curtailment, communication and stopped failure masks
    curtailment_mask = pl.col("CurtailmentState_5min.REP") == 1
    comm_failure_mask = pl.col("CommunicationState_5min.REP") != 0
    stopped_mask = pl.col("IEC-OperationState_5min.REP") < 2

    # Logic for DC loss due to open strings
    dc_loss_open_curtailment = (
        pl.when(
            (pl.col("down_strings_above_normal") >= pl.col("daily_down_strings_above_normal"))
            & (pl.col("down_strings_above_normal") > 0),
        )
        .then(pl.col("daily_down_strings_above_normal") * pl.col("filtered_mean"))
        .otherwise(0.0)
    )

    dc_loss_open = (
        pl.when(comm_failure_mask)
        .then(0.0)
        .otherwise(
            pl.when(curtailment_mask)
            .then(dc_loss_open_curtailment)
            .otherwise(
                pl.col("down_strings_above_normal") * pl.col("filtered_mean"),
            ),
        )
        .alias("dc_loss_open_strings")
    )

    df = df.with_columns(dc_loss_open)

    # ? --- CALCULATING UNDERPERFORMING STRINGS AND SHADING DC POWER LOSS ---

    misaligned_trackers_mask = pl.col("MisalignedTrackers_5min.REP") == 1

    # Frozen data detection: rolling check if values are unchanged
    # For each string, check if value == previous value (shifted by 1)
    strings_unchanged_exprs = [(pl.col(c) == pl.col(c).shift(1)).cast(pl.Int32).alias(f"_unch_{c}") for c in string_cols]
    df = df.with_columns(strings_unchanged_exprs)

    # Rolling sum of unchanged flags over the frozen window
    frozen_roll_exprs = [
        pl.col(f"_unch_{c}")
        .rolling_sum(window_size=self._minimum_frozen_periods, min_samples=self._minimum_frozen_periods)
        .alias(f"_frozen_{c}")
        for c in string_cols
    ]
    df = df.with_columns(frozen_roll_exprs)

    # Count strings that are fully frozen (rolling sum == minimum_frozen_periods)
    frozen_count = pl.sum_horizontal(
        [(pl.col(f"_frozen_{c}") >= self._minimum_frozen_periods).cast(pl.Int32) for c in string_cols],
    )
    frozen_data_mask = frozen_count > (len(string_cols) * 0.5)

    # Underperformance: string < best_string_power * underperform_ratio, and not down
    underperform_thresh = pl.col("best_string_power") * self._underperform_ratio

    # Individual underperformance loss per string (positive clipped difference from best)
    underperf_loss_exprs = [
        pl.when(
            (pl.col(c) < underperform_thresh) & ~pl.col(f"_down_{c}"),
        )
        .then((pl.col("best_string_power") - pl.col(c)).clip(lower_bound=0.0))
        .otherwise(0.0)
        .alias(f"_uploss_{c}")
        for c in string_cols
    ]
    df = df.with_columns(underperf_loss_exprs)

    total_dc_loss_underperf_shading = pl.sum_horizontal([pl.col(f"_uploss_{c}") for c in string_cols]).alias(
        "_total_dc_underperf_shading",
    )
    df = df.with_columns(total_dc_loss_underperf_shading)

    # ? --- SOLAR POSITION for underperformance/shading window ---

    # Convert timestamps to UTC pandas DatetimeIndex for pvlib (pvlib requires pandas)
    timestamps_pl = df["timestamp"]
    timestamps_pd = pd.DatetimeIndex(timestamps_pl.cast(pl.Datetime("ms", time_zone=None)).to_pandas())
    times_utc = timestamps_pd + pd.Timedelta(hours=3)

    obj_attrs = self._requirement_data("RequiredObjectAttributes")[self.object]
    solar_position = pvlib.solarposition.get_solarposition(
        time=times_utc,
        latitude=obj_attrs["latitude"],
        longitude=obj_attrs["longitude"],
    )

    # Identifying the timestamp of solar noon for each day
    daily_solar_noon = solar_position.groupby(solar_position.index.normalize())["elevation"].idxmax()

    # converting back to local time by removing 3 hours
    daily_solar_noon = daily_solar_noon - pd.Timedelta(hours=3)

    # Defining underperformance window as 3 hours before and after solar noon
    t_start_underperf_daily = daily_solar_noon - pd.Timedelta(hours=3)
    t_end_underperf_daily = daily_solar_noon + pd.Timedelta(hours=3)

    # Create a Series with daily start and end times for each date in df
    unique_dates = timestamps_pd.normalize().unique()

    # Build Series from calculated solar noon values
    daily_start_times_series = t_start_underperf_daily.reindex(unique_dates)
    daily_end_times_series = t_end_underperf_daily.reindex(unique_dates)

    # Forward fill to use previous day's values when available
    daily_start_times_series = daily_start_times_series.ffill()
    daily_end_times_series = daily_end_times_series.ffill()

    # For any remaining NaT values (no previous day available), use theoretical values
    theoretical_start = pd.Series(
        [date.replace(hour=9, minute=0) for date in unique_dates],
        index=unique_dates,
    )
    theoretical_end = pd.Series(
        [date.replace(hour=15, minute=0) for date in unique_dates],
        index=unique_dates,
    )

    daily_start_times_series = daily_start_times_series.fillna(theoretical_start)
    daily_end_times_series = daily_end_times_series.fillna(theoretical_end)

    # Map the daily windows to each timestamp
    dates_norm = timestamps_pd.normalize()
    daily_start_times = dates_norm.map(daily_start_times_series).to_numpy()
    daily_end_times = dates_norm.map(daily_end_times_series).to_numpy()

    is_underperforming_time = (timestamps_pd.to_numpy() >= daily_start_times) & (timestamps_pd.to_numpy() < daily_end_times)
    is_night = solar_position["elevation"].to_numpy() < 0

    # Add boolean masks as columns
    df = df.with_columns(
        [
            pl.Series("_is_underperforming_time", is_underperforming_time),
            pl.Series("_is_night", is_night),
        ],
    )

    # Split underperformance vs shading losses
    df = df.with_columns(
        [
            pl.when(pl.col("_is_underperforming_time"))
            .then(pl.col("_total_dc_underperf_shading"))
            .otherwise(0.0)
            .alias("dc_loss_underperforming"),
            pl.when(~pl.col("_is_underperforming_time"))
            .then(pl.col("_total_dc_underperf_shading"))
            .otherwise(0.0)
            .alias("dc_loss_shading"),
        ],
    )

    # Zeroing losses during communication, misaligned trackers, inverter stopped, and frozen data
    zero_underperf_shading_mask = comm_failure_mask | misaligned_trackers_mask | stopped_mask | frozen_data_mask
    df = df.with_columns(
        [
            pl.when(zero_underperf_shading_mask)
            .then(0.0)
            .otherwise(pl.col("dc_loss_underperforming"))
            .alias("dc_loss_underperforming"),
            pl.when(zero_underperf_shading_mask).then(0.0).otherwise(pl.col("dc_loss_shading")).alias("dc_loss_shading"),
        ],
    )

    # Zeroing losses when it is night time
    df = df.with_columns(
        [
            pl.when(pl.col("_is_night")).then(0.0).otherwise(pl.col("dc_loss_open_strings")).alias("dc_loss_open_strings"),
            pl.when(pl.col("_is_night")).then(0.0).otherwise(pl.col("dc_loss_underperforming")).alias("dc_loss_underperforming"),
            pl.when(pl.col("_is_night")).then(0.0).otherwise(pl.col("dc_loss_shading")).alias("dc_loss_shading"),
        ],
    )

    # ? --- CONVERTING DC LOSSES TO AC LOSSES ---
    # Total positive DC power (sum across string cols)
    total_dc_power = pl.sum_horizontal([pl.col(c).fill_null(0.0) for c in string_cols]).alias("_total_dc_power")
    df = df.with_columns(total_dc_power)

    # Extract arrays for _convert_dc_to_ac_loss
    dc_power_arr = df["_total_dc_power"].to_numpy()
    ac_power_arr = df["ActivePower_5min.AVG"].to_numpy()
    setpoint_arr = df["ActivePowerSetPointPercent_5min.AVG"].to_numpy()

    open_loss = self._convert_dc_to_ac_loss(
        np.clip(df["dc_loss_open_strings"].to_numpy(), 0, None),
        dc_power_arr,
        ac_power_arr,
        setpoint_arr,
    )
    underperforming_loss = self._convert_dc_to_ac_loss(
        np.clip(df["dc_loss_underperforming"].to_numpy(), 0, None),
        dc_power_arr,
        ac_power_arr,
        setpoint_arr,
    )
    shaded_loss = self._convert_dc_to_ac_loss(
        np.clip(df["dc_loss_shading"].to_numpy(), 0, None),
        dc_power_arr,
        ac_power_arr,
        setpoint_arr,
    )

    t2 = perf_counter()

    # ? --- Saving results into a df to save into db ---
    result_pl = pl.DataFrame(
        {
            "timestamp": df["timestamp"],
            "LostActivePowerOpenStrings_5min.AVG": pl.Series(open_loss),
            "LostActivePowerUnderperfStrings_5min.AVG": pl.Series(underperforming_loss),
            "LostActivePowerShading_5min.AVG": pl.Series(shaded_loss),
        },
    )

    # Assigning the final DataFrame to the self.result attribute
    self._result = result_pl

    # Saving the result into the database
    self.save(save_into=save_into, **kwargs)

    logger.debug(
        f"{self.object} - {self.feature} - {period}: Requirements during calc {t1 - t0:.2f}s - Data adjustments {t2 - t1:.2f}s -Saving data {perf_counter() - t2:.2f}s",
    )

    return result_pl

save(save_into=None, **kwargs)

Method to save the calculated feature values in performance_db.

Parameters:

  • save_into

    (Literal['all', 'performance_db'] | None, default: None ) –

    Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

    By default None.

  • **kwargs

    (dict, default: {} ) –

    Not being used at the moment. Here only for compatibility.

Source code in echo_energycalc/feature_calc_core.py
Python
def save(
    self,
    save_into: Literal["all", "performance_db"] | None = None,
    **kwargs,  # noqa: ARG002
) -> None:
    """
    Method to save the calculated feature values in performance_db.

    Parameters
    ----------
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    **kwargs : dict, optional
        Not being used at the moment. Here only for compatibility.
    """
    # checking arguments
    if not isinstance(save_into, str | type(None)):
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
    if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
        raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")

    # checking if calculation was done
    if self.result is None:
        raise ValueError(
            "The calculation was not done. Please call 'calculate' before calling 'save'.",
        )

    if save_into is None:
        return

    upload_to_bazefield = save_into == "all"

    if not isinstance(self.result, pl.DataFrame):
        raise TypeError(f"result must be a polars DataFrame, not {type(self.result)}.")
    if "timestamp" not in self.result.columns:
        raise ValueError("result DataFrame must contain a 'timestamp' column.")

    # rename feature columns to "object@feature" format expected by perfdb polars insert
    feat_cols = [c for c in self.result.columns if c != "timestamp"]
    result_pl = self.result.rename({col: f"{self.object}@{col}" for col in feat_cols})

    self._perfdb.features.values.series.insert(
        df=result_pl,
        on_conflict="update",
        bazefield_upload=upload_to_bazefield,
    )