Sorting rows in a Pandas DataFrame means rearranging rows based on the values of one or more columns. Pandas provides the sort_values() method to do this efficiently.
In this example, we sort movies by their release year in ascending order.
import pandas as pd
# Create a simple DataFrame
movies = pd.DataFrame({ 'Movie': ['The Godfather', 'Inception', 'Titanic'],
'Year': [1972, 2010, 1997] })
result = movies.sort_values(by='Year')
print(result)
Output
Movie Year 0 The Godfather 1972 2 Titanic 1997 1 Inception 2010
Explanation:
- sort_values(by='Year') sorts rows using the values in the Year column.
- Movies are arranged from the oldest (1972) to the most recent (2010).
Syntax
DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, na_position='last')
Parameters:
- by: column label(s) to sort by.
- axis: 0 for rows (default), 1 for columns.
- ascending: True = smallest first, False = largest first.
- na_position: 'first' puts NaN first, 'last' puts them last.
Return Value: Returns a DataFrame with rows sorted by the given column(s).
Examples
Example 1: In this example, we sort student scores in Science from highest to lowest.
import pandas as pd
df = pd.DataFrame({ 'Name': ['Simon', 'Marsh', 'Alex', 'Selena'],
'Science': [7, 9, 4, 7] })
result = df.sort_values(by='Science', ascending=False)
print(result)
Output
Name Science 1 Marsh 9 0 Simon 7 3 Selena 7 2 Alex 4
Explanation:
- ascending=False sorts Science marks from highest to lowest.
- Marsh with 9 comes first, Alex with 4 comes last.
Example 2: This code sorts students first by Maths, then by English, both in ascending order.
import pandas as pd
df = pd.DataFrame({ 'Name': ['Simon', 'Marsh', 'Gaurav', 'Alex'],
'Maths': [8, 5, 6, 9],
'English': [7, 4, 7, 6] })
result = df.sort_values(by=['Maths', 'English'])
print(result)
Output
Name Maths English 1 Marsh 5 4 2 Gaurav 6 7 0 Simon 8 7 3 Alex 9 6
Explanation:
- Sorting happens by Maths first.
- If two rows have the same Maths score, they are sorted by English.
Example 3: In this example, we sort by Science but display missing values (NaN) first.
import pandas as pd
df = pd.DataFrame({ 'Name': ['Simon', 'Marsh', 'Alex', 'Selena'],
'Science': [7, None, 4, 9] })
result = df.sort_values(by='Science', na_position='first')
print(result)
Output
Name Science 1 Marsh NaN 2 Alex 4.0 0 Simon 7.0 3 Selena 9.0
Explanation:
- na_position='first' places NaN values before numeric values.
- Marsh with missing Science score comes first.