In this article we will explore various techniques to access a column in a dataframe with pandas with concise explanations and practical examples.
Method 1: Accessing a Single Column Using Bracket Notation
Bracket notation is the most straightforward method to access a column. Use the syntax df['column_name'] to retrieve the column as a Pandas Series. This method is quick, intuitive, and works for all valid column names.
import pandas as pd
data = {
'Name': ['John', 'Alice', 'Bob', 'Eve'],
'Age': [25, 30, 22, 35],
'Salary': [50000, 55000, 40000, 70000]
}
df = pd.DataFrame(data)
# Accessing the 'salary' column using bracket notation
age_column = df['Salary']
print(age_column)
Output
0 50000 1 55000 2 40000 3 70000 Name: Salary, dtype: int64
Method 2: Accessing a Single Column Using Dot Notation
In addition to bracket notation, you can also access columns using dot notation (df.column_name). This is a more concise and readable approach but can only be used when the column name is a valid Python attribute.
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing the 'Name' column using dot notation
name_column = df.Name
print(name_column)
Output
0 Michael 1 Sarah 2 David 3 Emma Name: Name, dtype: object
Method 3: Accessing Multiple Columns Using Bracket Notation
You can access multiple columns by passing a list of column names inside the brackets. This returns a new DataFrame containing only the specified columns.
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing 'Name' and 'Salary' columns using bracket notation
subset_columns = df[['Name', 'Salary']]
print(subset_columns)
Output
Name Salary 0 Michael 60000 1 Sarah 62000 2 David 45000 3 Emma 75000
Method 4: Accessing Columns by Index Using iloc
If you don't know the column names but know their positions you can use the iloc indexer to access columns by their integer index. This is particularly useful for large datasets or when you need to access columns programmatically.
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing the second column (Age) using iloc
age_column = df.iloc[:, 1]
print(age_column)
Output
0 40 1 28 2 33 3 25 Name: Age, dtype: int64
You can refer this article for detailed explanation: Extracting rows using Pandas .iloc[] in Python
Method 5: Accessing Columns by Condition Using Boolean Indexing
You can access columns based on conditions or filters using boolean indexing. This allows to dynamically select rows that meet specific criteria and access their corresponding columns.
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing rows where Salary is greater than or equal to 60000
high_salary = df[df['Salary'] >= 60000]
print(high_salary)
Output
Name Age Salary 0 Michael 40 60000 1 Sarah 28 62000 3 Emma 25 75000
Method 6: Accessing Columns Using loc for Label-Based Indexing
The loc indexer allows to access rows and columns by their labels. It is more flexible than iloc, as it can be used with both row and column names.
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Accessing rows where 'Age' is greater than 30 using loc
age_above_30 = df.loc[df['Age'] > 30]
print(age_above_30)
Output
Name Age Salary 0 Michael 40 60000 2 David 33 45000
Method 7: Accessing Columns Dynamically
Sometimes, you may need to access columns dynamically based on variables or user input. You can achieve this by using a variable that stores the column name and accessing it using bracket notation.
import pandas as pd
data = {
'Name': ['Michael', 'Sarah', 'David', 'Emma'],
'Age': [40, 28, 33, 25],
'Salary': [60000, 62000, 45000, 75000]
}
df = pd.DataFrame(data)
# Access column name dynamically
column_name = 'Salary'
salary_column = df[column_name]
print(salary_column)
Output
0 60000 1 62000 2 45000 3 75000 Name: Salary, dtype: int64
Hence, For simple column access, bracket or dot notation works best. If you're working with dynamic conditions or large datasets, consider using methods like iloc, loc, or boolean indexing. Experiment with these techniques to find the best approach for your data manipulation tasks.