The index in a Pandas DataFrame represents the labels assigned to each row. It helps in identifying and accessing data efficiently and can be either default numeric values or custom-defined labels.
Accessing and Modifying the Index
Accessing and modifying the index allows to understand how rows are labeled and customize them as needed. One can view the existing index using the .index attribute and later update it based on your requirements.
import pandas as pd
data = {'Name': ['Jake', 'Eve', 'Charlie'],
'Age': [ 22, 35, 28],
'Gender': [ 'Male', 'Female', 'Male'],
'Salary': [40000, 70000, 48000]}
df = pd.DataFrame(data)
print(df.index)
Output
RangeIndex(start=0, stop=3, step=1)
Setting a Custom Index
The set_index() method is used to change the index of a DataFrame by setting one or more columns as the new index.
import pandas as pd
data = {'Name': ['Jake', 'Mike'],
'Age': [25, 30],
'Salary': [50000, 55000]}
df = pd.DataFrame(data)
res = df.set_index('Name')
print(res)
Output
Age Salary Name Jake 25 50000 Mike 30 55000
Resetting the Index
If one need to reset the index back to default integer index, use reset_index() method. This will convert the current index into a regular column and create a new default index.
import pandas as pd
data = {'Name': ['Jake', 'Maria', 'Sam'],
'Age': [25, 30, 22] }
df = pd.DataFrame(data)
res = df.reset_index(drop=True)
print(res)
Output
Name Age 0 Jake 25 1 Maria 30 2 Sam 22
Indexing with loc
The loc[] method in pandas allows to access rows and columns of a dataFrame using their labels, making it easy to retrieve specific data points.
import pandas as pd
data = {'age': [25, 30], 'city': ['NY', 'LA']}
df = pd.DataFrame(data, index=['Alice', 'Bob'])
row = df.loc['Alice']
print(row)
Output
age 25 city NY Name: Alice, dtype: object
Changing the Index
The set_index() method is used to change the index of a DataFrame by setting one or more columns as the new index.
import pandas as pd
data = { 'Name': ['Jake', 'Mike', 'Sam'],
'Age': [25, 30, 22],
'Salary': [50000, 55000, 40000] }
df = pd.DataFrame(data)
res = df.set_index('Age')
print(res)
Output
Name Salary Age 25 Jake 50000 30 Mike 55000 22 Sam 40000