Randomly Select Rows from Pandas DataFrame

If a DataFrame has multiple rows, you can randomly select a few of them instead of working with the whole dataset. For example, Suppose you have this DataFrame with rows [A, B, C, D, E]. If you randomly pick 2 rows, one possible result could be [C, E].

Here is the sample DataFrame used in this article:

Python

import pandas as pd

data = {'Employee': ['Emily', 'Emma', 'Jake', 'David', 'Eva'],
        'Department': ['HR', 'IT', 'Finance', 'Marketing', 'IT'],
        'Age': [28, 34, 25, 42, 30],
        'Salary': [50000, 60000, 45000, 70000, 52000]}
df = pd.DataFrame(data)
print(df)

Output

Employee Department Age Salary
0 Emily HR 28 50000
1 Emma IT 34 60000
2 Jake Finance 25 45000
3 David Marketing 42 70000
4 Eva IT 30 52000

Let’s explore different methods to randomly select rows from a Pandas DataFrame.

Using sample()

The sample() method allows specifying the number of rows, a fraction of rows, whether to sample with replacement, weights and reproducibility via random_state.

Example: Below, we randomly select one row using sample().

Python

row = df.sample()
print(row)

Output

Employee Department Age Salary
2 Jake Finance 25 45000

Explanation:

df.sample() selects one random row by default.
Returns a DataFrame with the sampled row.
Each execution may return a different row unless random_state is set.

Using n parameter

The n parameter specifies the exact number of rows to select randomly.

Example: Here, we select three random rows from the DataFrame.

Python

rows = df.sample(n=3)
print(rows)

Output

Employee Department Age Salary
2 Jake Finance 25 45000
3 David Marketing 42 70000
4 Eva IT 30 52000

Explanation:

n=3 instructs Pandas to return 3 rows.
Rows are selected randomly without replacement by default.

Using frac Parameter

The frac parameter selects a fraction of rows instead of a fixed number.

Example: In this example, we select 50% of rows randomly from the DataFrame.

Python

sampled_df = df.sample(frac=0.5)
print(sampled_df)

Output

Employee Department Age Salary
2 Jake Finance 25 45000
3 David Marketing 42 70000

Explanation:

frac=0.5 selects half of the DataFrame rows randomly.
Useful when you want a proportional random sample instead of a fixed number.

Using replace=True

By default, sampling is without replacement. Setting replace=True allows the same row to be selected multiple times.

Example: This code select 5 rows randomly, allowing duplicates.

Python

sampled_replace = df.sample(n=5, replace=True)
print(sampled_replace)

Output

Employee Department Age Salary
1 Emma IT 34 60000
2 Jake Finance 25 45000
0 Emily HR 28 50000
0 Emily HR 28 50000
0 Emily HR 28 50000

Explanation:

replace=True allows the same row to appear multiple times.
Useful for bootstrapping or resampling methods.

Using weights

The weights parameter assigns probabilities to rows so that some rows are more likely to be selected.

Example: This program select 3 rows with weighted probabilities.

Python

weights = [0.1, 0.2, 0.3, 0.2, 0.2]
weighted_rows = df.sample(n=3, weights=weights)
print(weighted_rows)

Output

Employee Department Age Salary
0 Emily HR 28 50000
2 Jake Finance 25 45000
1 Emma IT 34 60000

Explanation:

weights is a list of probabilities for each row.
Rows with higher weights have a higher chance of being selected.

Using axis Parameter

sample() can also sample columns instead of rows by setting axis=1.

Example: Here, we select 2 random columns from the DataFrame.

Python

col_sample = df.sample(n=2, axis=1)
print(col_sample)

Output

Department Salary
0 HR 50000
1 IT 60000
2 Finance 45000
3 Marketing 70000
4 IT 52000

Explanation:

axis=1 changes the sampling from rows to columns.
n=2 selects two columns randomly.

Using random_state for Reproducibility

random_state ensures the same rows are selected every time the code runs.

Example: In this example, we select 2 reproducible random rows.

Python

fixed_rows = df.sample(n=2, random_state=42)
print(fixed_rows)

Output

Employee Department Age Salary
1 Emma IT 34 60000
4 Eva IT 30 52000

Explanation:

random_state seeds the random number generator.
Ensures the same random selection on each run.

Using NumPy

NumPy provides an alternative by selecting row indices randomly, then using loc to fetch rows.

Example: Here we select 3 random rows using NumPy.

Python

import numpy as np

indices = np.random.choice(df.index, size=3, replace=False)
np_rows = df.loc[indices]
print(np_rows)

Output

Employee Department Age Salary
4 Eva IT 30 52000
0 Emily HR 28 50000
3 David Marketing 42 70000

Explanation:

np.random.choice randomly selects row indices.
replace=False ensures no duplicates.
df.loc[indices] fetches the corresponding rows.

Related Article:
Pandas DataFrame
NumPy Introduction
Randomly Select Columns from Pandas DataFrame

Randomly Select Rows from Pandas DataFrame

Using sample()

Using n parameter

Using frac Parameter

Using replace=True

Using weights

Using axis Parameter

Using random_state for Reproducibility

Using NumPy

Explore