Different Ways to Iterate Over Rows in Pandas Dataframe

Last Updated : 3 Oct, 2025

Iterating over rows means processing each row one by one to apply some calculation or condition. For example, Consider a DataFrame of student's marks with columns Math and Science, you want to calculate the total score per student row by row.

Let’s consider this DataFrame:

Python
import pandas as pd
import numpy as np

df = pd.DataFrame({ 'A': [5, 7, 3, 9, 2],
                    'B': [10, 20, 30, 40, 50],
                    'C': ['X', 'Y', 'X', 'Z', 'Y'] })
print(df)

Output

A B C
0 5 10 X
1 7 20 Y
2 3 30 X
3 9 40 Z
4 2 50 Y

Using Vectorization

Vectorized operations operate on whole columns at once (no Python-level loop). They are the fastest and most memory-efficient for column-wise transformations.

Example: In this example, compute Result = A*B when C == 'X', otherwise Result = A + B, using np.where.

Python
df1 = df.copy()
df1['Result'] = np.where(df1['C'] == 'X', df1['A'] * df1['B'], df1['A'] + df1['B'])
print(df1)

Output

A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52

Explanation:

  • df1 = df.copy() work on a copy so the original stays unchanged.
  • np.where(condition, true_val, false_val) evaluates the condition for all rows at once.
  • For rows with C == 'X' it assigns A * B; otherwise A + B.
  • Assignment updates the Result column in a single, fast vectorized operation.

Using itertuples()

itertuples() yields each row as a named tuple. It’s faster and lighter than iterrows() and preserves dtypes good when you need Python-level row access but care about performance.

Example: In this example, compute the same Result using itertuples() and collect results in a list.

Python
df2 = df.copy()
res = []
for r in df2.itertuples(index=False):
    res.append(r.A * r.B if r.C == 'X' else r.A + r.B)
df2['Result'] = res
print(df2)

Output

A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52

Explanation:

  • df2 = df.copy() isolate changes.
  • for r in df2.itertuples(index=False) iterate rows as tuples (r.A, r.B, r.C).
  • Compute the conditional expression per tuple and append to res.
  • Assign res to df2['Result'] after the loop.

Using apply()

.apply() runs a function on each row (or column). It’s readable and good for more complex row-level logic when vectorization is difficult. It’s usually slower than itertuples() but easier to express complex rules.

Example: In this example, use apply() with a small function returning the same Result.

Python
df3 = df.copy()
def f(r):
    return r['A'] * r['B'] if r['C'] == 'X' else r['A'] + r['B']
df3['Result'] = df3.apply(f, axis=1)
print(df3)

Output

A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52

Explanation:

  • df3 = df.copy() avoid mutating the original.
  • Define f(r) that accepts a Series (a row) and returns the computed value.
  • df3.apply(f, axis=1) calls f for each row and builds a result Series.
  • Assigning that Series to df3['Result'] stores the per-row outputs. Use when logic is non-trivial.

Using iterrows()

iterrows() yields rows as Series objects. It’s easy to use but slow and may change dtypes; avoid for large data.

Example: In this example, compute Result with iterrows() and print each row’s total.

Python
df4 = df.copy()
res = []
for i, row in df4.iterrows():
    res.append(row['A'] * row['B'] if row['C'] == 'X' else row['A'] + row['B'])
df4['Result'] = res
print(df4)

Output

A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52

Explanation:

  • df4.iterrows() returns (index, Series) per row.
  • Access values via row['A'] etc.; this converts rows to Series, which is expensive.
  • Collect results in a list and assign back to Result.
Comment

Explore