Iterating over rows means processing each row one by one to apply some calculation or condition. For example, Consider a DataFrame of student's marks with columns Math and Science, you want to calculate the total score per student row by row.
Let’s consider this DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'A': [5, 7, 3, 9, 2],
'B': [10, 20, 30, 40, 50],
'C': ['X', 'Y', 'X', 'Z', 'Y'] })
print(df)
Output
A B C
0 5 10 X
1 7 20 Y
2 3 30 X
3 9 40 Z
4 2 50 Y
Using Vectorization
Vectorized operations operate on whole columns at once (no Python-level loop). They are the fastest and most memory-efficient for column-wise transformations.
Example: In this example, compute Result = A*B when C == 'X', otherwise Result = A + B, using np.where.
df1 = df.copy()
df1['Result'] = np.where(df1['C'] == 'X', df1['A'] * df1['B'], df1['A'] + df1['B'])
print(df1)
Output
A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52
Explanation:
- df1 = df.copy() work on a copy so the original stays unchanged.
- np.where(condition, true_val, false_val) evaluates the condition for all rows at once.
- For rows with C == 'X' it assigns A * B; otherwise A + B.
- Assignment updates the Result column in a single, fast vectorized operation.
Using itertuples()
itertuples() yields each row as a named tuple. It’s faster and lighter than iterrows() and preserves dtypes good when you need Python-level row access but care about performance.
Example: In this example, compute the same Result using itertuples() and collect results in a list.
df2 = df.copy()
res = []
for r in df2.itertuples(index=False):
res.append(r.A * r.B if r.C == 'X' else r.A + r.B)
df2['Result'] = res
print(df2)
Output
A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52
Explanation:
- df2 = df.copy() isolate changes.
- for r in df2.itertuples(index=False) iterate rows as tuples (r.A, r.B, r.C).
- Compute the conditional expression per tuple and append to res.
- Assign res to df2['Result'] after the loop.
Using apply()
.apply() runs a function on each row (or column). It’s readable and good for more complex row-level logic when vectorization is difficult. It’s usually slower than itertuples() but easier to express complex rules.
Example: In this example, use apply() with a small function returning the same Result.
df3 = df.copy()
def f(r):
return r['A'] * r['B'] if r['C'] == 'X' else r['A'] + r['B']
df3['Result'] = df3.apply(f, axis=1)
print(df3)
Output
A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52
Explanation:
- df3 = df.copy() avoid mutating the original.
- Define f(r) that accepts a Series (a row) and returns the computed value.
- df3.apply(f, axis=1) calls f for each row and builds a result Series.
- Assigning that Series to df3['Result'] stores the per-row outputs. Use when logic is non-trivial.
Using iterrows()
iterrows() yields rows as Series objects. It’s easy to use but slow and may change dtypes; avoid for large data.
Example: In this example, compute Result with iterrows() and print each row’s total.
df4 = df.copy()
res = []
for i, row in df4.iterrows():
res.append(row['A'] * row['B'] if row['C'] == 'X' else row['A'] + row['B'])
df4['Result'] = res
print(df4)
Output
A B C Result
0 5 10 X 50
1 7 20 Y 27
2 3 30 X 90
3 9 40 Z 49
4 2 50 Y 52
Explanation:
- df4.iterrows() returns (index, Series) per row.
- Access values via row['A'] etc.; this converts rows to Series, which is expensive.
- Collect results in a list and assign back to Result.