Randomly Select Rows from Pandas DataFrame

Last Updated : 3 Oct, 2025

If a DataFrame has multiple rows, you can randomly select a few of them instead of working with the whole dataset. For example, Suppose you have this DataFrame with rows [A, B, C, D, E]. If you randomly pick 2 rows, one possible result could be [C, E].

Here is the sample DataFrame used in this article:

Python
import pandas as pd

data = {'Employee': ['Emily', 'Emma', 'Jake', 'David', 'Eva'],
        'Department': ['HR', 'IT', 'Finance', 'Marketing', 'IT'],
        'Age': [28, 34, 25, 42, 30],
        'Salary': [50000, 60000, 45000, 70000, 52000]}
df = pd.DataFrame(data)
print(df)

Output

Employee Department Age Salary
0 Emily HR 28 50000
1 Emma IT 34 60000
2 Jake Finance 25 45000
3 David Marketing 42 70000
4 Eva IT 30 52000

Let’s explore different methods to randomly select rows from a Pandas DataFrame.

Using sample()

The sample() method allows specifying the number of rows, a fraction of rows, whether to sample with replacement, weights and reproducibility via random_state.

Example: Below, we randomly select one row using sample().

Python
row = df.sample()
print(row)

Output

Employee Department Age Salary
2 Jake Finance 25 45000

Explanation:

  • df.sample() selects one random row by default.
  • Returns a DataFrame with the sampled row.
  • Each execution may return a different row unless random_state is set.

Using n parameter

The n parameter specifies the exact number of rows to select randomly.

Example: Here, we select three random rows from the DataFrame.

Python
rows = df.sample(n=3)
print(rows)

Output

Employee Department Age Salary
2 Jake Finance 25 45000
3 David Marketing 42 70000
4 Eva IT 30 52000

Explanation:

  • n=3 instructs Pandas to return 3 rows.
  • Rows are selected randomly without replacement by default.

Using frac Parameter

The frac parameter selects a fraction of rows instead of a fixed number.

Example: In this example, we select 50% of rows randomly from the DataFrame.

Python
sampled_df = df.sample(frac=0.5)
print(sampled_df)

Output

Employee Department Age Salary
2 Jake Finance 25 45000
3 David Marketing 42 70000

Explanation:

  • frac=0.5 selects half of the DataFrame rows randomly.
  • Useful when you want a proportional random sample instead of a fixed number.

Using replace=True

By default, sampling is without replacement. Setting replace=True allows the same row to be selected multiple times.

Example: This code select 5 rows randomly, allowing duplicates.

Python
sampled_replace = df.sample(n=5, replace=True)
print(sampled_replace)

Output

Employee Department Age Salary
1 Emma IT 34 60000
2 Jake Finance 25 45000
0 Emily HR 28 50000
0 Emily HR 28 50000
0 Emily HR 28 50000

Explanation:

  • replace=True allows the same row to appear multiple times.
  • Useful for bootstrapping or resampling methods.

Using weights

The weights parameter assigns probabilities to rows so that some rows are more likely to be selected.

Example: This program select 3 rows with weighted probabilities.

Python
weights = [0.1, 0.2, 0.3, 0.2, 0.2]
weighted_rows = df.sample(n=3, weights=weights)
print(weighted_rows)

Output

Employee Department Age Salary
0 Emily HR 28 50000
2 Jake Finance 25 45000
1 Emma IT 34 60000

Explanation:

  • weights is a list of probabilities for each row.
  • Rows with higher weights have a higher chance of being selected.

Using axis Parameter

sample() can also sample columns instead of rows by setting axis=1.

Example: Here, we select 2 random columns from the DataFrame.

Python
col_sample = df.sample(n=2, axis=1)
print(col_sample)

Output

Department Salary
0 HR 50000
1 IT 60000
2 Finance 45000
3 Marketing 70000
4 IT 52000

Explanation:

  • axis=1 changes the sampling from rows to columns.
  • n=2 selects two columns randomly.

Using random_state for Reproducibility

random_state ensures the same rows are selected every time the code runs.

Example: In this example, we select 2 reproducible random rows.

Python
fixed_rows = df.sample(n=2, random_state=42)
print(fixed_rows)

Output

Employee Department Age Salary
1 Emma IT 34 60000
4 Eva IT 30 52000

Explanation:

  • random_state seeds the random number generator.
  • Ensures the same random selection on each run.

Using NumPy

NumPy provides an alternative by selecting row indices randomly, then using loc to fetch rows.

Example: Here we select 3 random rows using NumPy.

Python
import numpy as np

indices = np.random.choice(df.index, size=3, replace=False)
np_rows = df.loc[indices]
print(np_rows)

Output

Employee Department Age Salary
4 Eva IT 30 52000
0 Emily HR 28 50000
3 David Marketing 42 70000

Explanation:

  • np.random.choice randomly selects row indices.
  • replace=False ensures no duplicates.
  • df.loc[indices] fetches the corresponding rows.

Related Article:

Comment