Kaplan-Meier Estimator (Survival Analysis)

Last Updated : 17 Jun, 2025

Survival analysis is a statistical branch that deals with analyzing the expected duration until one or more events happen, such as death in biological organisms or failure in mechanical systems. An important step of survival analysis is estimating the survival function, which gives the probability of survival past at certain point in time. One of the most used non-parametric methods for this purpose is the Kaplan-Meier estimator.

The Survival Function

Let T be a non-negative random variable representing time until event. The survival function is:

S(t) = P(T > t)

where:

  • T: A non-negative random variable representing the time until an event occurs (e.g. death, failure).
  • t: A specific time point.
  • S(t)=P(T>t): The probability that the event has not occurred by time t; i.e., the subject survives beyond time t.

The Problem of Censoring

In real-life data, we often encounter censoring, which means we don’t observe the exact survival time for all individuals. The most common type is right-censoring, where a subject has not yet experienced the event by the end of the study or is lost to follow-up.

For example, if a clinical trial ends after 2 years and a patient is still alive at that time, their survival time is said to be censored at 2 years.

The Kaplan-Meier Estimator

The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from censored data. It constructs a step function that drops at each observed event time, incorporating both complete and censored data.

Given event times t_1 < t_2 < \dots < t_k, with d_j events at time t_j, and n_j individuals at risk just before t_j, the Kaplan-Meier estimate is:

\hat{S}(t) = \prod_{t_j \le t} \left(1 - \frac{d_j}{n_j} \right)

The Kaplan-Meier estimator treats survival as a product of conditional survival probabilities. At each event time, it calculates the probability of survival, considering only those who were still at risk.

The product of these conditional probabilities gives an overall estimate of the survival function, accounting for both events and censored observations. When a subject is censored, they are simply removed from the risk set at the time of censoring, but do not contribute to the event count d_j.

Example Calculation

Suppose we observe 5 individuals with the following survival times (in months) and censoring indicators:

Subject

Time

Event (1 = event, 0 = censored)

1

2

1

2

3

0

3

4

1

4

5

1

5

6

0

Ordered Event Times and Risk Sets

We observe events at the following ordered times:

t_1 = 2, \quad t_2 = 4, \quad t_3 = 5

At each event time t_j, the number at risk n_j and the number of observed events d_j are:

  • \text{At } t_1 = 2:\quad n_1 = 5,\quad d_1 = 1
  • \text{At } t_2 = 4:\quad n_2 = 3,\quad d_2 = 1
  • \text{At } t_3 = 5:\quad n_3 = 2,\quad d_3 = 1

Kaplan-Meier Estimates

The Kaplan-Meier estimator \hat{S}(t) is computed recursively by:

\hat{S}(t_j) = \hat{S}(t_{j-1}) \cdot \left(1 - \frac{d_j}{n_j} \right), \quad \hat{S}(0) = 1

We now compute the values step-by-step:

  • \hat{S}(2) = 1 \cdot \left(1 - \frac{1}{5} \right) = 1 \cdot 0.8 = 0.8
  • \hat{S}(4) = 0.8 \cdot \left(1 - \frac{1}{3} \right) = 0.8 \cdot 0.6667 = 0.5333
  • \hat{S}(5) = 0.5333 \cdot \left(1 - \frac{1}{2} \right) = 0.5333 \cdot 0.5 = 0.2667

Survival Function \hat{S}(t)

The Kaplan-Meier estimator \hat{S}(t) is a step function defined as:

\hat{S}(t) =

  • 1 \quad \text{if } t < 2
  • 0.8 \quad \text{if } 2 \leq t < 4
  • 0.5333 \quad \text{if } 4 \leq t < 5
  • 0.2667 \quad \text{if } t \geq 5

Assumptions of Kaplan-Meier Estimator

For the Kaplan-Meier estimator to be valid, the following assumptions must hold:

  1. Independent Censoring: Censored subjects have the same survival prospects as those who continue to be followed.
  2. Events Occur at Recorded Times: Exact times of events are known.
  3. Subjects Are Identically Distributed: The population is homogeneous with respect to survival distribution.

Implementation in Python

Python
!pip install lifelines
from lifelines import KaplanMeierFitter
import matplotlib.pyplot as plt

# Sample data
T = [2, 3, 4, 5, 6]  # Time
E = [1, 0, 1, 1, 0]  # Event occurred or censored

kmf = KaplanMeierFitter()
kmf.fit(T, event_observed=E)

# Plot
kmf.plot_survival_function()
plt.title("Kaplan-Meier Survival Curve")
plt.xlabel("Time")
plt.ylabel("Survival Probability")
plt.show()

Output:

kaplan
Kaplan-Meier Survival Curve

Applications

The Kaplan-Meier estimator is widely used across disciplines:

  • Medicine: Estimating patient survival, time to recurrence, or drug efficacy.
  • Engineering: Estimating time-to-failure of machines or components.
  • Economics: Modeling unemployment durations.
  • Ecology: Studying animal survival or migration patterns.

Advantages

  • Handles censored data effectively.
  • Does not assume any underlying distribution.
  • Provides a simple visual representation of survival over time.

Limitations

  • Cannot incorporate covariates directly (unlike Cox models).
  • Assumes independence and identical distribution among subjects.
  • Estimates can become unstable in the tail (few individuals at risk).
Comment

Explore