Family-Wise Error Rate

In hypothesis testing, when researchers test many things at once, the risk of making at least one Type I error (false positive) increases. The Family-Wise Error Rate (FWER) is the probability of making one or more false positives across the family of tests. Controlling the FWER is important to ensure the overall reliability of the analysis.

The Family-Wise Error Rate (FWER) refers to the probability of making at least one Type I error (false positive) when conducting multiple hypothesis tests at the same time.

Let:

m: Total number of hypothesis tests.
H_0: Null hypothesis.
\alpha: Significance level for an individual test (commonly 0.05).

The FWER is formally defined as:

\text{FWER} = P(\text{At least one false positive} \mid \text{All } H_0 \text{ are true})

Under the assumption that all null hypotheses are true, and each test is independent, the FWER can be approximated as:

\text{FWER} = 1 - (1 - \alpha)^m

This equation shows that as the number of tests m increases, the FWER increases rapidly meaning the likelihood of making at least one incorrect rejection grows substantially. For small 𝛼 and large 𝑚, the FWER can become substantially larger than the desired significance level.

Importance of FWER in Multiple Hypothesis Testing

In multiple hypothesis testing, failing to control the FWER can lead to incorrect conclusions.

Example: In clinical trials, testing multiple treatments simultaneously increases the chance of finding at least one falsely significant result.
Scientific research: Testing multiple hypotheses without controlling the FWER inflates the probability of spurious findings.

Controlling the FWER is necessary to prevent false discoveries and ensure statistical integrity.

Example Calculation: If we perform 20 hypothesis tests at 𝛼 = 0.05

\text{FWER} = 1 - (1 - 0.05)^{20}

\text{FWER} \approx 0.64

Thus, there is a 64% chance of making at least one false positive.

FWER Correction Methods

To control the FWER, several correction techniques are used:

1. Bonferroni Correction

The Bonferroni correction is the simplest and most conservative method. It adjusts the significance level for each test by dividing the desired overall significance level 𝛼 by the number of tests 𝑚:

\alpha_{\text{adjusted}} = \frac{\alpha}{m}

Example:

If we conduct 10 hypothesis tests with a significance level of 𝛼 = 0.05:

\alpha_{\text{adjusted}} = \frac{0.05}{10} = 0.05

The Bonferroni correction reduces the risk of false positives but can be overly conservative, reducing the power of the test.

2. Holm-Bonferroni Method

The Holm-Bonferroni method is a step-down procedure that is less conservative than Bonferroni. It sorts the 𝑝-values in ascending order and applies the following formula:

\alpha_{\text{adjusted}} = \frac{\alpha}{m - (k - 1)}

Where: 𝑘 is the rank of the hypothesis.

Steps:

Rank the 𝑝-values from smallest to largest.
Apply the adjusted threshold incrementally.

3. Šidák Correction

The Šidák correction assumes the tests are independent. It adjusts the significance level using the formula:

\alpha_{\text{adjusted}} = 1 - (1 - \alpha)^{1/m}

For small 𝛼 values, the Šidák correction is similar to the Bonferroni correction but slightly less conservative.

4. Benjamini-Hochberg (False Discovery Rate)

The Benjamini-Hochberg (BH) method controls the False Discovery Rate (FDR) rather than the FWER. It ranks the 𝑝-values in ascending order and compares each 𝑝-value to:

\frac{k}{m} \times \alpha

Where:

𝑘 is the rank.
𝑚 is the total number of tests.

The BH method is more powerful than FWER corrections but allows for a small proportion of false positives.

Python Code with FWER Corrections

Here is the Python code with Bonferroni, Holm-Bonferroni, Šidák, and Benjamini-Hochberg corrections applied:

Python

import numpy as np
from statsmodels.stats.multitest import multipletests

# Simulated p-values from multiple hypothesis tests
p_values = np.array([0.01, 0.03, 0.05, 0.07, 0.1, 0.001, 0.04, 0.02])

# Bonferroni Correction
bonferroni_corrected = multipletests(p_values, alpha=0.05, method='bonferroni')

# Holm-Bonferroni Correction
holm_corrected = multipletests(p_values, alpha=0.05, method='holm')

# Šidák Correction
sidak_corrected = multipletests(p_values, alpha=0.05, method='sidak')

# Benjamini-Hochberg (FDR) Correction
bh_corrected = multipletests(p_values, alpha=0.05, method='fdr_bh')

# Display results
print("\nOriginal p-values:", p_values)
print("\nBonferroni Corrected p-values:", bonferroni_corrected[1])
print("Holm-Bonferroni Corrected p-values:", holm_corrected[1])
print("Šidák Corrected p-values:", sidak_corrected[1])
print("Benjamini-Hochberg Corrected p-values (FDR):", bh_corrected[1])

Output:

Screenshot-from-2025-03-28-22-44-42 — FWER Corrections Output

Real-World Applications

1. Genomics: In gene expression studies, thousands of genes are tested simultaneously. FWER correction reduces the risk of false positives.
2. Clinical Trials: When testing multiple drugs or treatment combinations, controlling the FWER ensures valid conclusions.
3. Marketing Experiments: Running A/B tests on multiple product variations requires FWER control to avoid false claims.

Related Articles:

How to Perform a Bonferroni Correction in R?
How to Conduct an Anderson-Darling Test in R
How to Perform Scheffe's Test in Excel?
How Does Do a Type-III SS ANOVA in R with Contrast Codes