Understanding the Overfitting Detector in CatBoost

Last Updated : 23 Jul, 2025

CatBoost, a gradient boosting library developed by Yandex, is known for its efficient handling of categorical features and robust performance. One of its key features is the overfitting detector, which helps prevent the model from overfitting to the training data. Overfitting occurs when a model learns the training data too well, capturing noise and details that do not generalize to new, unseen data. This article delves into the mechanisms of the CatBoost overfitting detector, its types, and how to implement it effectively.

Understanding CatBoost Overfitting Detector

The overfitting detector is an integrated component of CatBoost. You can use this tool to detect when your model begins to overfit the training set. To ensure that your model performs effectively , when applied to fresh data it halts the training process before overfitting happens. It functions in three ways:

  • Early Stopping: CatBoost has the option to end training before the predetermined number of trees if overfitting takes place. Via doing this the model is kept from learning the training set via memorization.
  • Threshold-Based Stopping: CatBoost uses a threshold value to determine, when to stop training. If the model’s performance on a validation dataset deteriorates beyond this threshold, training stops.
  • Optimal Metric Value: After achieving the optimal metric value, the overfitting detector keeps training for a few more rounds. This guarantees that the model doesn't terminate too soon.

Types of Overfitting Detectors in CatBoost

CatBoost offers two primary types of overfitting detectors: IncToDec and Iter. Each type has specific parameters and use cases.

1. IncToDec

The IncToDec overfitting detector monitors the change in the loss function on the validation dataset. If the loss function value increases beyond a specified threshold, the detector is triggered. This type of detector is useful when you want to continue training for a few more iterations even after detecting potential overfitting.

Parameters:

  • od_pval: The threshold value for the IncToDec detector. Training stops when this value is reached. It requires a validation dataset to be set.
  • od_wait: The number of iterations to continue training after the optimal metric value is reached. This allows the model to potentially improve further before stopping.

2. Iter

The Iter overfitting detector stops training after a specified number of iterations since the iteration with the optimal loss function value. This is similar to the early_stopping_rounds parameter used in other gradient boosting libraries like XGBoost and LightGBM.

Parameters:

od_wait: The number of iterations to continue training after the optimal metric value is reached. If the model does not improve within these iterations, training stops.

Implementing Overfitting Detection in CatBoost

Let's walk through the steps to use the overfitting detector in CatBoost.

1. Install CatBoost:

You must install CatBoost first. Pip can be used for this :

pip install catboost

2. Prepare Your Data:

After loading your data divide it into validation and training sets. To evaluate the model's performance on unobserved data , utilize the validation set.

3. Create a CatBoost Pool:

A Pool is a data structure used by CatBoost to store your data.

from catboost import Pool

train_data = Pool(data=X_train, label=y_train)
eval_data = Pool(data=X_val, label=y_val)

4. Set Up the Model:

Initialize the CatBoost model with the overfitting detector.

from catboost import CatBoostClassifier

model = CatBoostClassifier(
iterations=1000,
learning_rate=0.1,
eval_metric='AUC',
use_best_model=True,
od_type='Iter', # Use iteration-based overfitting detection
od_wait=50 # Number of iterations to wait before stopping
)

5. Train the Model:

In order to employ the overfitting detector train your model and add the validation set.

model.fit(train_data, eval_set=eval_data)

Example 1: CatBoost overfitting detector using Synthetic Dataset

In this example, the sklearn.datasets module make_classification method in Python will be used to generate a synthetic dataset. Next, we will show you how to use the overfitting detection in CatBoost.

Step 1: Import Required Libraries

To begin, we import the libraries required for interactive widgets, data visualization, and model training.

Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from catboost import CatBoostClassifier, Pool
import ipywidgets as widgets
from IPython.display import display, clear_output


Step 2: Generate Synthetic Dataset

We generate a synthetic dataset using the make_classification function from scikit-learn. This dataset will be used for training and evaluation.

Python
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=10, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

Step 3: Create CatBoost Pools

We create CatBoost Pool objects for the training and evaluation datasets. These objects are required for training CatBoost models.

Python
# Create CatBoost Pools
train_data = Pool(data=X_train, label=y_train)
eval_data = Pool(data=X_val, label=y_val)

Step 4: Define the Model Training Function

We define a function train_model to train the CatBoost model with user-adjustable parameters. The function also includes alert mechanisms for overfitting detection.

Python
def train_model(iterations, learning_rate, od_wait, enable_overfitting):
    clear_output(wait=True)  # Clear previous output for better visualization
    if enable_overfitting:
        od_wait = 10000  # Set a high value to disable early stopping
    
    model = CatBoostClassifier(
        iterations=iterations,
        learning_rate=learning_rate,
        eval_metric='AUC',
        use_best_model=True,
        od_type='Iter',
        od_wait=od_wait,
        verbose=False
    )

    model.fit(train_data, eval_set=eval_data, verbose=False)

    auc_scores = model.get_evals_result()['validation']['AUC']

    if model.tree_count_ < iterations:
        print(f'Overfitting detected. Training stopped at iteration {model.tree_count_}.')
        print('List of AUC scores per iteration:')
        print(auc_scores)
    else:
        print(f'Training completed for all {iterations} iterations.')

    train_accuracy = model.score(X_train, y_train)
    val_accuracy = model.score(X_val, y_val)

    print(f'Training Accuracy: {train_accuracy:.2f}')
    print(f'Validation Accuracy: {val_accuracy:.2f}')

    plt.plot(auc_scores)
    plt.xlabel('Iteration')
    plt.ylabel('AUC Score')
    plt.title('Training Process')
    plt.show()

Step 5: Create Interactive Widgets

We create interactive widgets such as sliders and buttons to allow users to adjust parameters and trigger model training.

Python
iterations_slider = widgets.IntSlider(value=1000, min=100, max=5000, step=100, description='Iterations:')
learning_rate_slider = widgets.FloatSlider(value=0.1, min=0.01, max=1.0, step=0.01, description='Learning Rate:')
od_wait_slider = widgets.IntSlider(value=50, min=10, max=200, step=10, description='OD Wait:')

overfitting_button = widgets.Button(description="Enable Overfitting")
normal_train_button = widgets.Button(description="Train Normally")

def on_overfitting_button_clicked(b):
    train_model(iterations_slider.value, learning_rate_slider.value, od_wait_slider.value, enable_overfitting=True)

def on_normal_train_button_clicked(b):
    train_model(iterations_slider.value, learning_rate_slider.value, od_wait_slider.value, enable_overfitting=False)

overfitting_button.on_click(on_overfitting_button_clicked)
normal_train_button.on_click(on_normal_train_button_clicked)

Step 6: Display the UI

We arrange the interactive widgets and display them for user interaction.

Python
# Display the UI
ui = widgets.VBox([iterations_slider, learning_rate_slider, od_wait_slider, overfitting_button, normal_train_button])
display(ui)

Output:

Overfitting detected. Training stopped at iteration 901.
List of AUC scores per iteration:
[0.8947894789478947, 0.9108910891089109, 0.9135413541354136, 0.9374937493749375, 0.9437443744374437, 0.94999499949995, 0.9498949894989499, 0.9505950595059506, 0.9602960296029603, 0.9608960896089609, 0.961896189618962, 0.9683968396839684, 0.9705970597059705, 0.9718971897189719, 0.9722972297229723, 0.9725972597259726, 0.9736973697369737, 0.9750975097509751, 0.9747974797479748, 0.9758975897589759, 0.9754975497549755, 0.9764976497649765, 0.9786978697869787, 0.978997899789979, 0.9794979497949795, 0.9794979497949795, 0.9808980898089809, 0.9805980598059806, 0.9810981098109811, 0.9822982298229823, 0.981998199819982, 0.9828982898289829, 0.9832983298329833, 0.9840984098409841, 0.9848984898489849, 0.9863986398639863, 0.9867986798679867, 0.9874987498749875, 0.987998799879988, 0.9880988098809881, 0.9882988298829883, 0.9881988198819882, 0.9888988898889889, 0.9885988598859886, 0.9892989298929893, 0.9896989698969897, 0.9896989698969897, 0.9893989398939894, 0.9893989398939894, 0.9897989798979898, 0.98989898989899, 0.9901990199019902, 0.9897989798979898, 0.98989898989899, 0.9900990099009901, 0.9896989698969897, 0.9895989598959896, 0.9894989498949895, 0.9897989798979898, 0.98999899989999, 0.9901990199019902, 0.98999899989999, 0.9906990699069907, 0.9906990699069907, 0.990999099909991, 0.9911991199119912, 0.9912991299129913, 0.9910991099109911, 0.9905990599059906, 0.9907990799079908, 0.9908990899089909, 0.9902990299029903, 0.990999099909991, 0.9910991099109911, 0.9912991299129913, 0.9915991599159916, 0.9911991199119912, 0.9912991299129913, 0.9915991599159916, 0.9910991099109911, 0.990999099909991, 0.9911991199119912, 0.9913991399139914, 0.9916991699169917, 0.9915991599159916, 0.9915991599159916, 0.9914991499149916, 0.9913991399139914, 0.9913991399139914, 0.9912991299129913, 0.9911991199119912, 0.9913991399139914, 0.9912991299129913, 0.9915991599159916, 0.9914991499149916, 0.9913991399139914, 0.9913991399139914, 0.9913991399139914, 0.991999199919992, 0.9908990899089909, 0.9907990799079908, 0.9905990599059906, 0.9905990599059906, 0.9907990799079908, 0.9906990699069907, 0.990999099909991, 0.990999099909991, 0.990999099909991, 0.9905990599059906, 0.9908990899089909, 0.990999099909991, 0.9910991099109911, 0.9908990899089909, 0.9912991299129913, 0.9912991299129913, 0.9908990899089909, 0.9910991099109911, 0.9912991299129913, 0.9911991199119912, 0.9911991199119912, 0.9911991199119912, 0.9910991099109911, 0.9911991199119912, 0.9912991299129913, 0.9913991399139914, 0.9915991599159916, 0.9913991399139914, 0.9915991599159916, 0.9920992099209921, 0.991999199919992, 0.991999199919992, 0.9920992099209921, 0.991999199919992, 0.9920992099209921, 0.9920992099209921, 0.991899189918992, 0.9917991799179918, 0.991999199919992, 0.991999199919992, 0.991999199919992, 0.9917991799179918, 0.9917991799179918, 0.9920992099209921, 0.991999199919992, 0.991899189918992, 0.9922992299229924, 0.9921992199219922, 0.9920992099209921, 0.9920992099209921, 0.9920992099209921, 0.991999199919992, 0.991999199919992, 0.9920992099209921, 0.9920992099209921, 0.9921992199219922, 0.9921992199219922, 0.9923992399239924, 0.9922992299229924, 0.9922992299229924, 0.9922992299229924, 0.9922992299229924, 0.9924992499249925, 0.9923992399239924, 0.9923992399239924, 0.9923992399239924, 0.9923992399239924, 0.9920992099209921, 0.9920992099209921, 0.9922992299229924, 0.9923992399239924, 0.9923992399239924, 0.9923992399239924, 0.9923992399239924, 0.9924992499249925, 0.9926992699269926, 0.9926992699269926, 0.9924992499249925, 0.9926992699269926, 0.9925992599259926, 0.9924992499249925, 0.9925992599259926, 0.9923992399239924, 0.9924992499249925, 0.9925992599259926, 0.9923992399239924, 0.9923992399239924, 0.9925992599259926, 0.9928992899289929, 0.9926992699269926, 0.9925992599259926, 0.9925992599259926, 0.9927992799279928, 0.9924992499249925, 0.9924992499249925, 0.9923992399239924, 0.9923992399239924, 0.9923992399239924, 0.9923992399239924, 0.9923992399239924, 0.9924992499249925, 0.9924992499249925, 0.9924992499249925, 0.9925992599259926, 0.9926992699269926, 0.9927992799279928, 0.9927992799279928, 0.9927992799279928, 0.9926992699269926, 0.9926992699269926, 0.992999299929993, 0.9928992899289929, 0.9927992799279928, 0.9928992899289929, 0.9927992799279928, 0.9928992899289929, 0.9928992899289929, 0.9928992899289929, 0.9927992799279928, 0.9927992799279928, 0.9928992899289929, 0.9928992899289929, 0.992999299929993, 0.9931993199319932, 0.9931993199319932, 0.993099309930993, 0.993099309930993, 0.993099309930993, 0.992999299929993, 0.992999299929993, 0.992999299929993, 0.9927992799279928, 0.9927992799279928, 0.9928992899289929, 0.9928992899289929, 0.992999299929993, 0.992999299929993, 0.992999299929993, 0.992999299929993, 0.992999299929993, 0.992999299929993, 0.9928992899289929, 0.992999299929993, 0.993099309930993, 0.9931993199319932, 0.9931993199319932, 0.9931993199319932, 0.9933993399339934, 0.9933993399339934, 0.9934993499349934, 0.9933993399339934, 0.9933993399339934, 0.9933993399339934, 0.9933993399339934, 0.9933993399339934, 0.9933993399339934, 0.9933993399339934, 0.9934993499349934, 0.9934993499349934, 0.9936993699369937, 0.9934993499349934, 0.9933993399339934, 0.9933993399339934, 0.9935993599359936, 0.9936993699369937, 0.9936993699369937, 0.9935993599359936, 0.9934993499349934, 0.9934993499349934, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9933993399339934, 0.9933993399339934, 0.9933993399339934, 0.9934993499349934, 0.9935993599359936, 0.9934993499349934, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9936993699369937, 0.9935993599359936, 0.9936993699369937, 0.9936993699369937, 0.9934993499349934, 0.9934993499349934, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9934993499349934, 0.9935993599359936, 0.9935993599359936, 0.9936993699369937, 0.9936993699369937, 0.9936993699369937, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9935993599359936, 0.9937993799379938, 0.9937993799379938, 0.9936993699369937, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.993999399939994, 0.993999399939994, 0.9936993699369937, 0.9937993799379938, 0.9936993699369937, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9937993799379938, 0.9938993899389938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9936993699369937, 0.9936993699369937, 0.9937993799379938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.993999399939994, 0.9938993899389938, 0.993999399939994, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9938993899389938, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9940994099409941, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9938993899389938, 0.9938993899389938, 0.993999399939994, 0.9938993899389938, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9938993899389938, 0.9938993899389938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.993999399939994, 0.9938993899389938, 0.9937993799379938, 0.9938993899389938, 0.9938993899389938, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9936993699369937, 0.9936993699369937, 0.9937993799379938, 0.9937993799379938, 0.9937993799379938, 0.9936993699369937, 0.9936993699369937, 0.9937993799379938, 0.9937993799379938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.993999399939994, 0.9938993899389938, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9941994199419942, 0.9940994099409941, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9940994099409941, 0.9940994099409941, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.9938993899389938, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9938993899389938, 0.9938993899389938, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.993999399939994, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9941994199419942, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9941994199419942, 0.9941994199419942, 0.9943994399439944, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9941994199419942, 0.9943994399439944, 0.9943994399439944, 0.9943994399439944, 0.9943994399439944, 0.9943994399439944, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9942994299429943, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941, 0.9940994099409941]
Training Accuracy: 1.00
Validation Accuracy: 0.97
download-(1)-(1)
Training stopped at iteration 901

Example 2: CatBoost overfitting detector with Wine Dataset

In this example, we will use the Wine Quality dataset from the UCI Machine Learning Repository. This dataset can be loaded directly from a URL.

Step 1: Import Necessary Libraries

Python
import pandas as pd
from sklearn.model_selection import train_test_split
from catboost import CatBoostRegressor, Pool
import matplotlib.pyplot as plt

Step 2: Load the Dataset

Load the Wine Quality dataset directly from the URL.

Python
# Load data
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
data = pd.read_csv(url, delimiter=';')

# Display first few rows of the dataset
print(data.head())

Output:

   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0 7.4 0.70 0.00 1.9 0.076
1 7.8 0.88 0.00 2.6 0.098
2 7.8 0.76 0.04 2.3 0.092
3 11.2 0.28 0.56 1.9 0.075
4 7.4 0.70 0.00 1.9 0.076

free sulfur dioxide total sulfur dioxide density pH sulphates \
0 11.0 34.0 0.9978 3.51 0.56
1 25.0 67.0 0.9968 3.20 0.68
2 15.0 54.0 0.9970 3.26 0.65
3 17.0 60.0 0.9980 3.16 0.58
4 11.0 34.0 0.9978 3.51 0.56

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5

Step 3: Prepare the Data

Split the data into features and target variable, then split into training and validation sets.

Python
# Features and target variable
X = data.drop('quality', axis=1)
y = data['quality']

# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Define and Train the Model

Define the CatBoostRegressor and set the overfitting detector parameters.

Python
# Create a CatBoost Pool for validation data
val_pool = Pool(X_val, y_val)

# Initialize CatBoostRegressor
model = CatBoostRegressor(
    iterations=1000,
    learning_rate=0.1,
    depth=6,
    od_type="Iter",           # Overfitting detector type
    od_wait=20                # Number of iterations to wait before stopping
)

# Train the model
model.fit(X_train, y_train, eval_set=val_pool, verbose=False)

Output:

<catboost.core.CatBoostRegressor at 0x7febc4f21570>

Step 5: Evaluate the Model

Evaluate the model's performance on the validation set.

Python
# Evaluate model
rmse = np.sqrt(((model.predict(X_val) - y_val) ** 2).mean())
print(f'Validation RMSE: {rmse:.2f}')

Output:

Validation RMSE: 0.56

Step 6: Visualize Training and Validation Loss

Plot the training and validation loss to see where the overfitting detector stops the training process.

Python
# Plot training and validation error
plt.plot(model.get_evals_result()['learn']['RMSE'], label='Training RMSE')
plt.plot(model.get_evals_result()['validation']['RMSE'], label='Validation RMSE')
plt.xlabel('Iterations')
plt.ylabel('RMSE')
plt.title('Training and Validation RMSE')
plt.legend()
plt.show()

Output:

download-(2)
Training and Validation Loss

Conclusion

Monitoring the training progress and using the overfitting detector in CatBoost is crucial for developing robust machine learning models. By setting appropriate parameters, you can prevent overfitting and ensure that your model generalizes well to new data. The IncToDec and Iter overfitting detectors provide flexibility in handling different training scenarios, making CatBoost a powerful tool for both classification and regression tasks.

Comment