Exploring Pokemon data in R

Pokémon is more than a popular franchise. It is also a rich dataset full of patterns and insights. The data includes attributes such as HP, attack, defense, special stats, types, generation, and whether a Pokémon is legendary.

Dataset Overview

The dataset encompasses detailed attributes for each Pokémon, including:

Name: The Pokémon's name.
Type 1 & Type 2: Primary and secondary elemental types.
Total: Sum of all base stats.
HP: Hit Points, indicating health.
Attack & Defense: Physical attack and defense stats.
Special Attack & Special Defense: Stats for special moves.
Speed: Determines move order in battles.
Generation: The generation in which the Pokémon was introduced.
Legendary Status: Indicates if a Pokémon is legendary

Dataset Link: Pokemon dataset

Installing the Necessary Libraries

We will install and load the packages that we will use like ggplot2 , gridExtra and plotly using the install.packages() function and library() function.

install.packages("ggplot2")
install.packages("gridExtra")
install.packages("plotly")

library(ggplot2)
library(gridExtra)
library(plotly)

Exploring the Dataset

We will now load our dataset and explore it contents , using various functions and statistical methods.

1. Loading the Data

We will read our dataset , which is the Pokemon.csv file using the function read.csv() function and display it first few rows.

pokemon_data <- read.csv("Pokemon.csv")

head(pokemon_data)

Output:

pokemon_head — Exploring Pokemon data in R

2. Using str() function

To display the pokemon_data data frame structure, the str function is used. The names and data types of each column, such as numerics, numbers, characters or factors, shall also be printed. This also gives insight into data organisation as a whole.

str(pokemon_data)

Output:

3. Using summary() function

Summarizing the dataset: The summary function shows a summary of the data, such as: statistics of numeric columns, mean, median, quartile, minimum, maximum and frequency counts.

summary(pokemon_data)

Output:

4. Explore specific columns

These lines of code gives the output of the names of the particular pokemon and their primary types.

head(pokemon_data[c('Name','Type.1')])

Output:

Various Visualizations of pokemon data

We will now plot various visualizations to explore the dataset further.

1. Histogram

This line of code shown the histogram of the attack column of the pokemon data.

hist(pokemon_data$Attack)

Output:

2. Bar Plot

This line of code shows the of bar plot , giving count for different types of pokemon (type.1).

type_distribution <- table(pokemon_data$Type.1)

barplot(type_distribution, main = "Distribution of Pokemon Types", 
        xlab = "Type", ylab = "Count", col = rainbow(length(type_distribution)))

Output:

3. Scatter Plot

This code generates individual scatter plots for each pair of attributes (Attack vs Defence) with color-coding based on Pokémon types. This approach should be more manageable in terms of computation time. Adjustments can be made based on your specific needs and preferences.

selected_pokemon <- pokemon_data[sample(1:nrow(pokemon_data), 200), ]

scatter_plots <- ggplot(selected_pokemon, aes(x = Attack, y = Defense)) +
    geom_point() +
    labs(title = "Area Plot: Attack vs Defence") +
    theme_minimal()

scatter_plots

Output:

4. Pie Chart

A quick glance at the ratio of legendary to unlegendary Pokémon can be found using a pie chart.

legendary_distribution <- table(pokemon_data$Legendary)
pie(legendary_distribution, main = "Proportion of Legendary Pokemon", 
    labels = c("Non-Legendary", "Legendary"), col = c("skyblue", "lightcoral"))

Output:

Screenshot-(1464) — Pie chart visualization

5. Box Plot

We will create box plots for each pair of attributes (Sp..Def and Sp..Atk) for some selected pokemons.

selected_pokemon <- pokemon_data[sample(1:nrow(pokemon_data), 800), ]

box_plots <- list(
  ggplot(selected_pokemon, aes(x = Sp..Atk, y = Sp..Def, color = Type.1)) +
    geom_boxplot()+
    labs(title = "Box plot: Sp.Attack and Sp.Defense")+
    theme_minimal()
  
)

grid.arrange(grobs = box_plots, ncol = 1)

Output:

6. Column Plot

We will create a column plot to showcase speed vs defence for the different type1 pokemons.

scatter_plots <- list(
  ggplot(pokemon_data, aes(x = Speed, y = Defense, color = Type.1)) +
    geom_col()+
    labs(title = "Column Plot: Speed vs Defense") +
    theme_minimal()
  
)
grid.arrange(grobs = scatter_plots, ncol = 1)

Output:

columnplot — Column Plot of Speed VS Defence

7. Step Plot

We randomly select 50 Pokémon from the dataset to simplify visualization. A step plot is then created using ggplot2 to show how Attack varies with HP, colored by the primary type (Type.1). The plot is displayed using grid.arrange(), allowing easy expansion for more plots later.

selected_pokemon <- pokemon_data[sample(1:nrow(pokemon_data), 50), ]

scatter_plots <- list(
  ggplot(selected_pokemon, aes(x = HP, y = Attack, color = Type.1)) +
    geom_step()+
    labs(title = "Step Plot: HP vs Attack") +
    theme_minimal()
  
)
grid.arrange(grobs = scatter_plots, ncol = 1)

Output:

In this article, we explored the rich dataset of Pokémon attributes using R, uncovering patterns and insights through various visualizations and analyses. From examining basic distributions to creating advanced plots, we demonstrated how this data can be used to understand relationships between features. This sets the stage for further exploration, such as clustering similar Pokémon or building predictive models for battle outcomes.