Building a Stock Price Prediction Model with CatBoost

Stock prices may seem unpredictable, but they often follow patterns in data. By analyzing past prices and market indicators, we can build a machine learning model to forecast future movements. We will use CatBoost for Stock Prediction because:

A gradient boosting algorithm that captures non-linear relationships in financial data.
Automatically processes categorical variables without heavy preprocessing.
Built-in techniques help improve model stability, which is critical for volatile markets.
Faster experimentation and quicker model iteration.
Delivers competitive accuracy compared to other boosting algorithms.

Catboost — CatBoost for stock price prediction

Implementation

Step 1: Importing required Libraries

Here we will import Pandas, Numpy, Matplotlib and Scikit Learn for its implementation.

Python

import pandas as pd
import numpy as np
from catboost import CatBoostRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

Step 2: Load and Clean the Dataset

Download the dataset from here

Python

data = pd.read_csv("Your dataset path")
data.head()

Output:

Tesla-stock-dataset — Tesla stock dataset

Step 3: Clean the Dataset

Remove incorrect rows and prepare the dataset for processing.

Python

data = data.iloc[1:].reset_index(drop=True)

data.head()

Output:

Step 4: Convert Date Column and Sort Data

Stock data must always be in time order.
Sorting ensures the model learns from past and predicts future correctly.

Python

data['date'] = pd.to_datetime(data['date'])

data = data.sort_values('date')

Step 5: Feature Engineering

Machine learning models cannot directly understand dates. So we extract: Year, Month and Day

Python

data['Year'] = data['date'].dt.year
data['Month'] = data['date'].dt.month
data['Day'] = data['date'].dt.day

data.drop(columns=['date'], inplace=True)

data.head()

Output:

Step 6: Define Features and Target

Separate input variables and output variable.

X: Open, High, Low, Volume, Year, Month, Day
y: Close price (what we predict)

Python

X = data.drop(columns=['Close'])
y = data['Close']

Step 7: Time Based Train Test Split

We must split the data sequentially without shuffling; preserving time order which is important to prevent future data from leaking into the training process

Python

split = int(len(data) * 0.8)

X_train = X[:split]
X_test = X[split:]

y_train = y[:split]
y_test = y[split:]

Step 8: Train CatBoost Model

The model learns patterns from historical stock data. CatBoost automatically handles feature relationships.

Python

model = CatBoostRegressor(
    iterations=500,
    learning_rate=0.05,
    depth=6,
    verbose=0
)
model.fit(X_train, y_train)

Step 9: Make Predictions

Predict stock prices on unseen data.

Python

y_pred = model.predict(X_test)

Step 10: Evaluate Model Performance

Measure how accurate the predictions are:

RMSE shows average prediction error.
Lower RMSE means better performance.

Python

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print("RMSE:", rmse)

Output:

RMSE: 7.203

Step 11: Visualize Results

This helps visually compare:

Real stock movement
Model predictions

Python

plt.figure(figsize=(10,5))
plt.plot(y_test.values, label="Actual Price")
plt.plot(y_pred, label="Predicted Price")
plt.legend()
plt.title("Actual vs Predicted Close Price")
plt.show()

Output:

Actual-vs-predcted — Actual vs predicted stock prices

Download full code from here

Building a Stock Price Prediction Model with CatBoost

Implementation

Step 1: Importing required Libraries

Step 2: Load and Clean the Dataset

Step 3: Clean the Dataset

Step 4: Convert Date Column and Sort Data

Step 5: Feature Engineering

Step 6: Define Features and Target

Step 7: Time Based Train Test Split

Step 8: Train CatBoost Model

Step 9: Make Predictions

Step 10: Evaluate Model Performance

Step 11: Visualize Results

Explore