The Cox Proportional Hazards Model, often just called the Cox model, is a statistical technique used in survival analysis. It helps us understand how different factors affect the time it takes for an event to happen. The “event” could be anything: death, equipment failure, customer churn, etc.
Instead of focusing on whether or not something happens, the Cox model is more interested in "when" it happens.
What is Survival Analysis?
Before diving into the Cox model, let’s quickly understand survival analysis.
Survival analysis is used to study time-to-event data. For example:
- How long does a patient survive after treatment?
- How many months does a customer stay subscribed before leaving?
- How long until a machine breaks down?
In this context:
- Survival time means the time until the event occurs.
- Censoring happens when we don't see the event before the end of the study (e.g., the customer hasn’t left yet).
What Does the Cox Model Do?
The Cox model answers this key question:
How does each variable (like age, gender, income, etc.) affect the risk of the event happening at any point in time?
It does not assume any particular shape for the distribution of survival times, which makes it a semi-parametric model.
The Cox Model
The Cox model calculates something called the hazard. Hazard is just the risk of the event happening right now, given that the person or thing has survived up to this point.
Cox Model Formula
h(t \mid X) = h_0(t) \cdot e^{\beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p}
Where:
h(t \mid X) : Hazard function at timet given covariatesX h_0(t) : Baseline hazard function (hazard when allX_i = 0 )\beta_1, \beta_2, \ldots, \beta_p : Regression coefficientsX_1, X_2, \ldots, X_p : Covariates (e.g., age, treatment group, etc.)
Interpretation
If
\beta > 0 : As age increases, the hazard (risk) increases.\beta < 0 : As age increases, the hazard (risk) decreases.
Hazard Ratio:
\text{Hazard Ratio} = e^{\beta}
- If
e^{\beta} > 1 : Higher risk - If
e^{\beta} = 1 : No effect - If
e^{\beta} < 1 : Lower risk
Example Scenario
Let’s say we’re studying patients undergoing treatment for a disease. Our variables are:
- Age
- Treatment type
- Blood pressure
We use the Cox model to see which of these factors significantly affect survival time.
The output might show:
- Age: hazard ratio = 1.2 → older age increases risk.
- Treatment A: hazard ratio = 0.8 → this treatment reduces risk.
- High blood pressure: hazard ratio = 1.5 → increases risk.
So, we learn which factors matter and how they affect the timing of the event.
Implementing Cox Proportional Hazards in Python
Install Required Library
pip install lifelines
Load Sample Dataset
We'll use a sample dataset included in lifelines. This dataset is from a study of parolees, and includes:
week: survival timearrest: event indicator (1 if arrested, 0 if not)- Covariates like
age,fin,race,wexp, etc.
import pandas as pd
from lifelines.datasets import load_rossi
# Load the sample dataset
df = load_rossi()
print(df.head())
Output:

Fit the Cox Proportional Hazards Model
from lifelines import CoxPHFitter
# Create and fit the model
cph = CoxPHFitter()
cph.fit(df, duration_col='week', event_col='arrest')
# Print the summary of the model
cph.print_summary()
Output:
In the printed summary, look for:
- coef: These are the β values.
- exp(coef): These are the hazard ratios (HR).
- If
exp(coef) > 1, the risk increases. - If
exp(coef) < 1, the risk decreases.
- If
- p-values: Show the significance of each variable.
Applications of the Cox Proportional Hazards Model
- Medical Research: Used to study how treatments, risk factors, or patient characteristics affect survival time (e.g., effect of a drug on cancer survival).
- Clinical Trials: Compares survival times between treatment and control groups while adjusting for other variables.
- Epidemiology: Assesses the impact of exposures (like smoking or diet) on time to disease occurrence.
- Insurance and Actuarial Science: Models policyholder survival time or time until a claim is filed.
- Engineering (Reliability Analysis): Evaluates the effect of stress, load, or environment on product failure times.
- Sociology and Economics: Studies time until an event like job change, divorce, or loan default.