Solar Tracker Loss¶

Overview¶

The SolarEnergyLossTracker class is a subclass of SolarEnergyLossCalculator and FeatureCalculator that calculates the value of energy loss due to tracker misalignment.

Calculation Logic¶

The calculation works as follows:

Get and Normalize Target Inverter Data. Get the time-series data for the target inverter, including its Active Power, Misaligned Trackers flag, and power losses from open strings. The Active Power is then normalized by adding back the open string losses to create a power baseline that isolates tracker-related issues.
Filter Nighttime Data. Identify and filter out nighttime data points. This is done by calculating the solar elevation angle, with pvlib, based on the asset's coordinates and setting all feature values to zero for timestamps when the sun is below the horizon.
Perform Initial Check for Misalignment. An initial check is performed on the target inverter. If no misalignment flags are found during the entire period, the production loss is considered zero for all timestamps, and the calculation is finalized for that inverter.
Build a Corrected Reference Map. If misalignment flags are detected, get the time-series data for all predefined neighboring inverters. The power data for each neighbor is also normalized by adding back its respective open string losses to ensure a fair, like-for-like comparison.
Establish the High-Performance Benchmark. A high-performance benchmark power is established for each timestamp. This is done by first identifying all neighboring inverters that are 'healthy' (misalignment flag is 0), then calculating the 75^th percentile of their normalized power values, and finally taking the average of only those powers that are in the upper quartile (at or above the 75^th percentile).
Calculate the Production Loss. The final production loss is calculated for the target inverter. For timestamps where the target inverter has a misalignment flag, the loss is the (Benchmark Power) - (Target Inverter's Normalized Power). For all other timestamps, the loss is zero. It is important to notice that this calculation allows for negative loss values (i.e., a gain) if the misaligned inverter outperforms the benchmark.
Adjust result Make the adjustment whenever the inverter has a communication problem or is shutdown a. - Communication failure: Set all losses to 0 kW
- Inverter stopped: Set all losses to 0 kW
Finalize the Loss Series. The final loss series is generated. In cases where a benchmark could not be calculated, using the primary logic, for a specific timestamp (e.g., all neighbors were also misaligned), the resulting loss for that timestamp uses a alternative reference: the maximum power among all neighbor inverters. So, when all inverters are misaligned, the loss calculate is related to the best power production nearby.

Database Requirements¶

Feature attribute server_calc_type must be set to 'solar_energy_loss_tracker'.

Class Definition¶

`SolarEnergyLossTracker(object_name, feature)` ¶

Base class for solar energy loss due to tracker misalignment.

Parameters:

object_name ¶
(str) –

Name of the object for which the feature is calculated. It must exist in performance_db.
feature ¶
(str) –

Feature of the object that is calculated. It must exist in performance_db.

Source code in echo_energycalc/solar_energy_loss_tracker.py

def __init__(self, object_name: str, feature: str) -> None:
    """
    Class used to calculate active power losses due to tracker misalignment Feature for solar assets.

    Parameters
    ----------
    object_name : str
        Name of the object for which the feature is calculated. It must exist in performance_db.
    feature : str
        Feature of the object that is calculated. It must exist in performance_db.
    """
    # initialize parent class
    super().__init__(object_name, feature)

    # Defining which object attributes are required for the calculation.
    self._add_requirement(
        RequiredObjectAttributes(
            {
                self.object: [
                    "neighbor_inverters",
                    "latitude",
                    "longitude",
                ],
            },
        ),
    )
    self._get_required_data()

    # Defining the features that will be required for the calculation. All DC Power inputs and the curtailment state.
    features = [
        "ActivePower_5min.AVG",
        "LostActivePowerOpenStrings_5min.AVG",
        "MisalignedTrackers_5min.REP",
        "IEC-OperationState_5min.REP",
        "CommunicationState_5min.REP",
    ]

    # Adding suffix _b# to features -> necessary to aquire data from bazefield
    features = {self.object: [f"{feat}_b#" for feat in features]}
    self._add_requirement(RequiredFeatures(features=features))

`feature` `property` ¶

Feature that is calculated. This will be defined in the constructor and cannot be changed.

Returns:

str –

Name of the feature that is calculated.

`name` `property` ¶

Name of the feature calculator. Is defined in child classes of FeatureCalculator.

This must be equal to the "server_calc_type" attribute of the feature in performance_db.

Returns:

str –

Name of the feature calculator.

`object` `property` ¶

Object for which the feature is calculated. This will be defined in the constructor and cannot be changed.

Returns:

str –

Object name for which the feature is calculated.

`requirements` `property` ¶

List of requirements of the feature calculator. Is defined in child classes of FeatureCalculator.

Returns:

dict[str, list[CalculationRequirement]] –

Dict of requirements.

The keys are the names of the classes of the requirements and the values are lists of requirements of that class.

For example: {"RequiredFeatures": [RequiredFeatures(...), RequiredFeatures(...)], "RequiredObjects": [RequiredObjects(...)]}

`result` `property` ¶

Result of the calculation. This is None until the method "calculate" is called.

Returns:

Series | DataFrame | None: –

Result of the calculation if the method "calculate" was called. None otherwise.

`calculate(period, save_into=None, cached_data=None, **kwargs)` ¶

Method that will calculate the loss due to misaligned trackers.

The calculation is done following those steps, for each inverter: 1. Get the complete time-series data for the target inverter, including its Active Power and Misalignment Flag for the specified period. 2. Pre-filter the target inverter's data to exclude nighttime records, ensuring the analysis is performed only on data from sunlight hours. 3. Perform a preliminary check on the target inverter's data. If no misalignment flags (flag == 1) are found within the entire period, the production loss is considered zero for all timestamps, and the algorithm proceeds to the next inverter. 4. If misalignment flags are present for the target inverter, get the complete time-series data for all of its predefined neighboring inverters to be used as a reference group. 5. Establish a high-performance benchmark power for each 5-minute timestamp by calculating the mean of the upper quartile of the healthy neighbors. This involves identifying all neighbors with a flag of 0, calculating the 75^th percentile of their power values, and then taking the average of only those powers that are above this percentile. 6. Calculate the final production loss for the target inverter. For every timestamp where the target inverter has a misalignment flag, the loss is calculated as the (Benchmark Power) - (Target Inverter's Power). For all other timestamps, the loss is set to zero. 7. Perform final data sanitization on the resulting loss series by ensuring that any timestamps where a benchmark could not be calculated (e.g., all neighbors were also misaligned) also result in a final loss of zero.

Parameters:

period ¶
(DateTimeRange) –

Period for which the feature will be calculated.
save_into ¶
(Literal['all', 'performance_db'] | None, default: None ) –

Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

By default None.
cached_data ¶
(DataFrame | None, default: None ) –

DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient. By default None
**kwargs ¶
(dict, default: {} ) –

Additional arguments that will be passed to the "save" method.

Returns:

Series –

Pandas Series with the calculated feature.

Source code in echo_energycalc/solar_energy_loss_tracker.py

def calculate(
    self,
    period: DateTimeRange,
    save_into: Literal["all", "performance_db"] | None = None,
    cached_data: DataFrame | None = None,
    **kwargs,
) -> Series:
    """
    Method that will calculate the loss due to misaligned trackers.

    The calculation is done following those steps, for each inverter:
    1. Get the complete time-series data for the target inverter, including its Active Power and Misalignment Flag for the specified period.
    2. Pre-filter the target inverter's data to exclude nighttime records, ensuring the analysis is performed only on data from sunlight hours.
    3. Perform a preliminary check on the target inverter's data.
        If no misalignment flags (flag == 1) are found within the entire period, the production loss is considered zero for all timestamps, and the algorithm proceeds to the next inverter.
    4. If misalignment flags are present for the target inverter, get the complete time-series data for all of its predefined neighboring inverters to be used as a reference group.
    5. Establish a high-performance benchmark power for each 5-minute timestamp by calculating the mean of the upper quartile of the healthy neighbors.
        This involves identifying all neighbors with a flag of 0, calculating the 75th percentile of their power values, and then taking the average of only those powers that are above this percentile.
    6. Calculate the final production loss for the target inverter.
        For every timestamp where the target inverter has a misalignment flag, the loss is calculated as the (Benchmark Power) - (Target Inverter's Power).
        For all other timestamps, the loss is set to zero.
    7. Perform final data sanitization on the resulting loss series by ensuring that any timestamps where a benchmark could not be calculated (e.g., all neighbors were also misaligned) also result in a final loss of zero.

    Parameters
    ----------
    period : DateTimeRange
        Period for which the feature will be calculated.
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    cached_data : DataFrame | None, optional
        DataFrame with features already queried/calculated. This is useful to avoid needing to query all the data again from performance_db, making chained calculations a lot more efficient.
        By default None
    **kwargs : dict, optional
        Additional arguments that will be passed to the "save" method.

    Returns
    -------
    Series
        Pandas Series with the calculated feature.
    """
    t0 = perf_counter()

    nearby_inverters = self._get_requirement_data("RequiredObjectAttributes")[self.object]["neighbor_inverters"]

    # creating a series to store the result
    result = self._create_empty_result(period=period, freq="5min", result_type="Series")

    # getting feature values
    self._get_required_data(
        period=period,
        reindex=None,
        round_timestamps={"freq": timedelta(minutes=5), "tolerance": timedelta(minutes=2)},
        cached_data=cached_data,
    )

    t1 = perf_counter()

    # getting DataFrame with feature values
    df = self._get_requirement_data("RequiredFeatures")
    df = df[self.object]

    # renaming columns to remove the _b# suffix
    df.columns = df.columns.str.removesuffix("_b#")

    # Fill NaN values with forward and back fill. This needs to be done due to current Bazefield TOTALIZER behavior.
    df[df.columns] = df[df.columns].ffill().bfill()

    # Defining crucial columns for calculation
    # Adding the lost power due to open strings to the active power to get only the losses due to misalignment.
    df["ActivePower_5min.AVG"] = df["ActivePower_5min.AVG"] + df["LostActivePowerOpenStrings_5min.AVG"]
    # dropping the lost power column as it won't be used anymore
    df = df.drop(columns=["LostActivePowerOpenStrings_5min.AVG"])

    t2 = perf_counter()

    # Adjusting the Series index with the results.
    # This is done to prevent missing indexes on the requested period of calculation. That is, if the calculated df has less points than the expected for the period.
    wanted_idx = result.index.intersection(df.index)
    result.loc[wanted_idx] = 0.0
    # Trimming result to the original period, just to be sure
    result = result[(result.index >= period.start) & (result.index <= period.end)].copy()
    # Adding calculated feature to class result attribute
    self._result = result.copy()

    timestamps = result.index

    # adding 3 hours to convert to UTC
    times_pd = timestamps + Timedelta(hours=3)

    # Calculate solar positions (zenith angles) - vectorized operation
    solar_position = pvlib.solarposition.get_solarposition(
        time=times_pd,
        latitude=self._get_requirement_data("RequiredObjectAttributes")[self.object]["latitude"],
        longitude=self._get_requirement_data("RequiredObjectAttributes")[self.object]["longitude"],
    )

    # Get the sun's elevation (altitude)
    # Sun altitude < 0 means the sun is below the horizon (night)
    is_night = solar_position["elevation"] < 0
    # Zera valores durante o período noturno usando is_night
    df.loc[is_night.values] = 0.0

    t3 = perf_counter()

    # First verification: if there are no misalignment flags in the entire period, return a series of zeros. No need to acquire data from neighbors.
    if not (df["MisalignedTrackers_5min.REP"] == 1).any():
        # adding calculated feature to class result attribute
        self._result = result.copy()

        # saving results
        self.save(save_into=save_into, **kwargs)

        logger.debug(
            f"{self.object} - {self.feature} - {period}: Requirements during calc {t1 - t0:.2f}s - Data adjustments {t2 - t1:.2f}s -Saving data {perf_counter() - t2:.2f}s",
        )
        return result

    # Getting data from neighboring inverters to be used as reference
    reference_data = SolarEnergyLossTracker.reference_map(
        neighbor_inverters_list=nearby_inverters,
        feature=self.feature,
        period=period,
        cached_data=cached_data,
    )

    # extracting features from neighboring inverters
    power_ref = reference_data.loc[:, (nearby_inverters, "ActivePower_5min.AVG")]
    power_ref.columns = power_ref.columns.get_level_values(0)

    flags_ref = reference_data.loc[:, (nearby_inverters, "MisalignedTrackers_5min.REP")]
    flags_ref.columns = flags_ref.columns.get_level_values(0)

    # keeping only the power values where the flag is 0 (healthy neighbors)
    valid_power_ref = power_ref.where(flags_ref == 0)
    # for each timestamp, we want to calculate the benchmark power as the mean of the upper quartile of the valid_power_ref
    # the result will be a Series with the same index as valid_power_ref
    q3_per_timestamp = valid_power_ref.quantile(q=0.75, axis=1)

    # keeping only the power values that are in the upper quartile, corresponding in the q3_per_timestamp Series.
    # Where the power is < Q3, the value will become NaN.
    upper_quartile_powers = valid_power_ref.where(valid_power_ref.ge(q3_per_timestamp, axis=0))

    # the final benchmark power will be the mean of the upper quartile powers
    final_power_ref = upper_quartile_powers.mean(axis=1)

    # for cases where all neighbors are misaligned, final_power_ref will be NaN. So we use an alternative reference, the max power of all neighbors
    alternative_ref = power_ref.max(axis=1)

    # creating the final benchmark: use the ideal reference and fill the gaps (NaN) with the alternative reference
    benchmark_power = final_power_ref.fillna(alternative_ref)
    # making sure there are no NaN values in the benchmark power. If there are, fill with 0. This should not happen, but just in case.
    benchmark_power = benchmark_power.fillna(0)

    # calculating power loss
    power_loss = benchmark_power - df["ActivePower_5min.AVG"]

    # the condition to use the calculated power loss is that the target inverter has the misalignment flag = 1
    target_condition = df["MisalignedTrackers_5min.REP"] == 1

    # Applying the final condition: the loss is only accounted for if the target inverter is misaligned. Otherwise, it is 0.
    df["power_loss"] = power_loss.where(
        target_condition,
        0.0,
    )

    # ! The power loss won't be clipped in 0. Cases where misaligned trackers generate more power than the reference will be accounted for.
    # ! This can happen, for example, when the misaligned tracker is in a better position relative to the sun or vegetation.

    # Zeroing losses during communication failure and stopped operation
    comm_failure_mask = df["CommunicationState_5min.REP"] != 0
    stopped_mask = df["IEC-OperationState_5min.REP"] < 2
    df.loc[
        comm_failure_mask | stopped_mask,
        ["power_loss"],
    ] = 0

    t4 = perf_counter()

    # updating result with the calculated power loss, matching indexes
    result.update(df["power_loss"])

    # adding calculated feature to class result attribute
    self._result = result.copy()

    # saving results
    self.save(save_into=save_into, **kwargs)

    logger.debug(
        f"{self.object} - {self.feature} - {period}: "
        f"Requirements during calc {t1 - t0:.2f}s - "
        f"Data adjustments {t2 - t1:.2f}s - "
        f"Solar position calc {t3 - t2:.2f}s - "
        f"Neighbor reference calc {t4 - t3:.2f}s - "
        f"Saving data {perf_counter() - t4:.2f}s",
    )

    return result

`reference_map(neighbor_inverters_list, feature, period, cached_data=None)` `staticmethod` ¶

Create a map of neighboring inverters to be used as reference for each inverter. Contains the same features as the main inverter.

Parameters:

neighbor_inverters_list ¶
(list[str]) –

list of neighboring inverters to be used as reference for each inverter. It must exist in performance_db.
feature ¶
(str) –

The name of the feature being calculated. Needed to instantiate neighbor objects.
period ¶
(DateTimeRange) –

Period for which the feature will be calculated.

Returns:

DataFrame –

A dataframe with MultiIndex columns containing the features of all neighboring inverters (level 0: inverter name, level 1: feature name, e.g., 'active_power', 'misalignment_flag').

Source code in echo_energycalc/solar_energy_loss_tracker.py

@staticmethod
def reference_map(
    neighbor_inverters_list: list[str],
    feature: str,
    period: DateTimeRange,
    cached_data: DataFrame | None = None,
) -> DataFrame:
    """Create a map of neighboring inverters to be used as reference for each inverter. Contains the same features as the main inverter.

    Parameters
    ----------
    neighbor_inverters_list : list[str]
        list of neighboring inverters to be used as reference for each inverter. It must exist in performance_db.
    feature : str
        The name of the feature being calculated. Needed to instantiate neighbor objects.
    period : DateTimeRange
        Period for which the feature will be calculated.

    Returns
    -------
    DataFrame
        A dataframe with MultiIndex columns containing the features of all neighboring inverters
        (level 0: inverter name, level 1: feature name, e.g., 'active_power', 'misalignment_flag').
    """
    df_map = DataFrame()
    for inv_name in neighbor_inverters_list:
        # Creating an instance of SolarEnergyLossTracker for each neighboring inverter
        neighbor_inv = SolarEnergyLossTracker(object_name=inv_name, feature=feature)

        neighbor_inv._get_required_data(
            period=period,
            reindex=None,
            round_timestamps={"freq": timedelta(minutes=5), "tolerance": timedelta(minutes=2)},
            cached_data=cached_data,
        )
        df_temp = neighbor_inv._get_requirement_data("RequiredFeatures")
        df = df_temp[neighbor_inv.object]

        # renaming columns to remove the _b# suffix from inverter level MultiIndex
        df.columns = df.columns.str.removesuffix("_b#")
        # Fill NaN values with forward and back fill. This needs to be done due to current Bazefield TOTALIZER behavior.
        df[df.columns] = df[df.columns].ffill().bfill()
        # Adding the lost power due to open strings to the active power to get only the losses due to misalignment.
        df["ActivePower_5min.AVG"] = df["ActivePower_5min.AVG"] + df["LostActivePowerOpenStrings_5min.AVG"]
        df = df.drop(columns=["LostActivePowerOpenStrings_5min.AVG"])

        # returning to MultiIndex columns
        df.columns = MultiIndex.from_product([[inv_name], df.columns])

        df_map = df if df_map.empty else concat([df_map, df], axis=1)
    return df_map

`save(save_into=None, **kwargs)` ¶

Method to save the calculated feature values in performance_db.

Parameters:

save_into ¶
(Literal['all', 'performance_db'] | None, default: None ) –

Argument that will be passed to the method "save". The options are: - "all": The feature will be saved in performance_db and bazefield. - "performance_db": the feature will be saved only in performance_db. - None: The feature will not be saved.

By default None.
**kwargs ¶
(dict, default: {} ) –

Not being used at the moment. Here only for compatibility.

Source code in echo_energycalc/feature_calc_core.py

def save(
    self,
    save_into: Literal["all", "performance_db"] | None = None,
    **kwargs,  # noqa: ARG002
) -> None:
    """
    Method to save the calculated feature values in performance_db.

    Parameters
    ----------
    save_into : Literal["all", "performance_db"] | None, optional
        Argument that will be passed to the method "save". The options are:
        - "all": The feature will be saved in performance_db and bazefield.
        - "performance_db": the feature will be saved only in performance_db.
        - None: The feature will not be saved.

        By default None.
    **kwargs : dict, optional
        Not being used at the moment. Here only for compatibility.
    """
    # checking arguments
    if not isinstance(save_into, str | type(None)):
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}")
    if isinstance(save_into, str) and save_into not in ["all", "performance_db"]:
        raise ValueError(f"save_into must be 'all', 'performance_db' or None, not {save_into}")

    # checking if calculation was done
    if self.result is None:
        raise ValueError(
            "The calculation was not done. Cannot save the feature calculation results. Please make sure to do something like 'self._result = df[self.feature].copy()' in the method 'calculate' before calling 'self.save()'.",
        )

    if save_into is None:
        return

    if isinstance(save_into, str):
        if save_into not in ["performance_db", "all"]:
            raise ValueError(f"save_into must be 'performance_db' or 'all', not {save_into}.")
        upload_to_bazefield = save_into == "all"
    elif save_into is None:
        upload_to_bazefield = False
    else:
        raise TypeError(f"save_into must be a string or None, not {type(save_into)}.")

    # converting result series to DataFrame if needed
    if isinstance(self.result, Series):
        result_df = self.result.to_frame()
    elif isinstance(self.result, DataFrame):
        result_df = self.result.droplevel(0, axis=1)
    else:
        raise TypeError(f"result must be a pandas Series or DataFrame, not {type(self.result)}.")

    # adjusting DataFrame to be inserted in the database
    # making the columns a Multindex with levels object_name and feature_name
    result_df.columns = MultiIndex.from_product([[self.object], result_df.columns], names=["object_name", "feature_name"])

    self._perfdb.features.values.series.insert(
        df=result_df,
        on_conflict="update",
        bazefield_upload=upload_to_bazefield,
    )

Solar Tracker Loss¶

Overview¶

Calculation Logic¶

Database Requirements¶

Class Definition¶

`SolarEnergyLossTracker(object_name, feature)` ¶

`object_name` ¶

`feature` ¶

`feature` `property` ¶

`name` `property` ¶

`object` `property` ¶

`requirements` `property` ¶

`result` `property` ¶

`calculate(period, save_into=None, cached_data=None, **kwargs)` ¶

`period` ¶

`save_into` ¶

`cached_data` ¶

`kwargs`** ¶

`reference_map(neighbor_inverters_list, feature, period, cached_data=None)` `staticmethod` ¶

`neighbor_inverters_list` ¶

`feature` ¶

`period` ¶

`save(save_into=None, **kwargs)` ¶

`save_into` ¶

`kwargs`** ¶

Solar Tracker Loss¶

Overview¶

Calculation Logic¶

Database Requirements¶

Class Definition¶

SolarEnergyLossTracker(object_name, feature) ¶

object_name ¶

feature ¶

feature property ¶

name property ¶

object property ¶

requirements property ¶

result property ¶

calculate(period, save_into=None, cached_data=None, **kwargs) ¶

period ¶

save_into ¶

cached_data ¶

**kwargs ¶

reference_map(neighbor_inverters_list, feature, period, cached_data=None) staticmethod ¶

neighbor_inverters_list ¶

feature ¶

period ¶

save(save_into=None, **kwargs) ¶

save_into ¶

**kwargs ¶

`SolarEnergyLossTracker(object_name, feature)` ¶

`object_name` ¶

`feature` ¶

`feature` `property` ¶

`name` `property` ¶

`object` `property` ¶

`requirements` `property` ¶

`result` `property` ¶

`calculate(period, save_into=None, cached_data=None, **kwargs)` ¶

`period` ¶

`save_into` ¶

`cached_data` ¶

`kwargs`** ¶

`reference_map(neighbor_inverters_list, feature, period, cached_data=None)` `staticmethod` ¶

`neighbor_inverters_list` ¶

`feature` ¶

`period` ¶

`save(save_into=None, **kwargs)` ¶

`save_into` ¶

`kwargs`** ¶