Negative Binomial Distribution using rnbinom in R

This article will cover the theory behind the Negative Binomial Distribution, how to use rnbinom() in R, and provide examples of generating random numbers, visualizing the distribution, and fitting it to real-world data using R Programming Language.

Negative Binomial Distribution

The Negative Binomial Distribution is a probability distribution used for modeling count data where the variance exceeds the mean, known as overdispersion. This distribution is particularly useful for modeling the number of failures before a specified number of successes in a sequence of independent Bernoulli trials. In R, the function rnbinom() is used to generate random numbers following the Negative Binomial Distribution.

`rnbinom()` in R

The rnbinom() function generates random numbers following the Negative Binomial Distribution. The syntax of rnbinom is as follows:

rnbinom(n, size, prob)
Where,
n: Number of observations to generate.
size: The number of successes (the parameter r).
prob: The probability of success in each trial (the parameter p).

Example 1: Generate Random Numbers Using `rnbinom()`

Let’s generate 1000 random numbers from a Negative Binomial Distribution with 5 successes and a success probability of 0.3 using rnbinom.

# Set seed for reproducibility
set.seed(123)

# Generate random numbers from Negative Binomial Distribution
neg_binom_data <- rnbinom(n = 1000, size = 5, prob = 0.3)

# Display the first few numbers
head(neg_binom_data)

Output:

[1] 11 19 16  8  6 22

Example 2: Visualizing the Negative Binomial Distribution

We can visualize the generated data using a histogram to see the shape of the distribution.

# Load necessary library
library(ggplot2)

# Create a histogram
ggplot(data = data.frame(x = neg_binom_data), aes(x = x)) +
  geom_histogram(binwidth = 1, fill = "blue", color = "black") +
  labs(title = "Histogram of Negative Binomial Distribution", 
       x = "Number of Failures", 
       y = "Frequency") +
  theme_minimal()

Output:

Visualizing the Negative Binomial Distribution

The histogram shows the distribution of the number of failures before achieving the specified number of successes. The shape of the distribution is skewed to the right, typical of count data with a low probability of success.

Example 3: Fitting a Negative Binomial Model to Real Data using rnbinom

In real-world scenarios, the Negative Binomial Distribution is often used to model overdispersed count data. Let’s simulate some overdispersed data and fit a Negative Binomial model using the MASS package.

# Load the MASS package for the glm.nb function
library(MASS)

# Simulate overdispersed data
set.seed(456)
x <- rnorm(100)
y <- rnbinom(100, mu = exp(1 + 0.5 * x), size = 2)

# Fit a Negative Binomial model to the data
nb_model <- glm.nb(y ~ x)

# Summarize the model
summary(nb_model)

Output:

Call:
glm.nb(formula = y ~ x, init.theta = 2.653492838, link = log)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.01214    0.09002  11.244  < 2e-16 ***
x            0.48632    0.08750   5.558 2.73e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(2.6535) family taken to be 1)

    Null deviance: 144.79  on 99  degrees of freedom
Residual deviance: 111.86  on 98  degrees of freedom
AIC: 439.23

Number of Fisher Scoring iterations: 1


              Theta:  2.653 
          Std. Err.:  0.742 

 2 x log-likelihood:  -433.227

The glm.nb() function is used to fit a Negative Binomial regression model.
In this example, we simulate overdispersed count data using rnbinom() and fit the model to the data using a linear predictor involving x.
The summary of the model will provide information on the significance of the predictors and the model fit.

Example 4: Comparing Poisson and Negative Binomial Models

In practice, you may want to compare the Poisson and Negative Binomial models to assess which fits better. This is done using the Akaike Information Criterion (AIC).

# Fit a Poisson model
poisson_model <- glm(y ~ x, family = "poisson")

# Compare AIC values
aic_values <- AIC(poisson_model, nb_model)
print(aic_values)

Output:

              df      AIC
poisson_model  2 480.6543
nb_model       3 439.2270

Poisson model: The Poisson model assumes that the mean equals the variance.
Negative Binomial model: The Negative Binomial model accounts for overdispersion.

The AIC values help in model comparison. The model with the lower AIC value is preferred.

Conclusion

The Negative Binomial Distribution is an important tool for modeling overdispersed count data, where the variance is larger than the mean. In R, you can use the rnbinom() function to generate random numbers from this distribution, and the glm.nb() function from the MASS package to fit models. Understanding when to use the Negative Binomial Distribution and how to implement it in R can greatly improve the analysis of count data in fields such as epidemiology, ecology, and social sciences.

Negative Binomial Distribution using rnbinom in R