How to use Different Algorithms using Caret Package in R

The caret (Classification And Regression Training) package in R provides a unified framework for training, tuning and evaluating a wide range of machine learning algorithms.

Installing and Loading the caret Package

We will install caret and load it along with any other necessary dependencies.

install.packages("caret")
library(caret)

Preparing the Data

We will be using the iris data set which is a built-in dataset in R Language. We will load the data and set a random seed and use createDataPartition() to split into 80% training and 20% testing to evaluate performance on unseen observations.

set.seed(123)
data(iris)

idx <- createDataPartition(iris$Species, p = 0.8, list = FALSE)
train_data <- iris[idx, ]
test_data  <- iris[-idx, ]

Classification Algorithms

There are many classification algorithms available in Caret package. We will define cross-validation, implement the model and evaluate the model on test data.

1. Random Forest

Random Forest aggregates many decision trees to reduce overfitting and improve accuracy.

install.packages("randomForest")
library(randomForest)

ctrl <- trainControl(method = "cv", number = 10)

rf_model <- train(Species ~ ., data = train_data,
                  method = "rf",
                  trControl = ctrl)

rf_preds <- predict(rf_model, test_data)
confusionMatrix(rf_preds, test_data$Species)

Output:

2. CART (Decision Tree)

CART builds a single tree and prunes it based on the complexity parameter (cp).

install.packages("rpart")
library(rpart)

cart_model <- train(Species ~ ., data = train_data,
                    method = "rpart",
                    trControl = ctrl)


cart_preds <- predict(cart_model, test_data)
confusionMatrix(cart_preds, test_data$Species)

Output:

3. k-Nearest Neighbors (k-NN)

k-NN classifies observations based on the majority vote of their k nearest neighbors.

knn_grid <- expand.grid(k = seq(3, 15, by = 2))

knn_model <- train(Species ~ ., data = train_data,
                   method = "knn",
                   trControl = ctrl,
                   tuneGrid = knn_grid)

knn_preds <- predict(knn_model, test_data)
confusionMatrix(knn_preds, test_data$Species)

Output:

4. Support Vector Machine (SVM)

SVM separates classes by finding the hyperplane with maximum margin.

install.packages("kernlab")
library(kernlab)

svm_model <- train(Species ~ ., data = train_data,
                   method = "svmRadial",
                   trControl = ctrl,
                   preProcess = c("center", "scale"))

svm_preds <- predict(svm_model, test_data)
confusionMatrix(svm_preds, test_data$Species)

Output:

Regression Algorithms

There are many classification algorithms available in Caret package. We will define cross-validation and summarise the model and evaluate the model using Root mean squared error (RMSE).

Data Split for Regression

We’ll use the mtcars dataset to demonstrate regression examples.

set.seed(456)
data(mtcars)

idx2 <- createDataPartition(mtcars$mpg, p = 0.8, list = FALSE)
train_reg <- mtcars[idx2, ]
test_reg  <- mtcars[-idx2, ]

1. Linear Regression

Ordinary least squares provides a baseline for regression.

ctrl_reg <- trainControl(method = "cv", number = 5)

lm_model <- train(mpg ~ ., data = train_reg,
                  method = "lm",
                  trControl = ctrl_reg)

print(lm_model)

lm_preds <- predict(lm_model, test_reg)
RMSE(lm_preds, test_reg$mpg)

Output:

2. Random Forest Regression

Random Forest Regression is an ensemble method that builds multiple decision trees and combines their results to improve accuracy and reduce overfitting.

rf_reg <- train(mpg ~ ., data = train_reg,
                method = "rf",
                trControl = ctrl_reg)
print(rf_reg)

rf_reg_preds <- predict(rf_reg, test_reg)
RMSE(rf_reg_preds, test_reg$mpg)

Output:

In this article, we demonstrated how to train and evaluate different classification and regression algorithms using the caret package in R, providing a consistent framework for model building, tuning and comparison.

How to use Different Algorithms using Caret Package in R

Installing and Loading the caret Package

Preparing the Data

Classification Algorithms

1. Random Forest

2. CART (Decision Tree)

3. k-Nearest Neighbors (k-NN)

4. Support Vector Machine (SVM)

Regression Algorithms

Data Split for Regression

1. Linear Regression

2. Random Forest Regression

Explore