Combining columns in pandas dataframe allows data manipulation and transformation making easier to analyze and visualize data. For instance, if you have a DataFrame with separate columns for first and last names- you can combine them into a single "Full Name" column. This can be achieved using various methods in Pandas, such as the + operator, str.cat(), and apply() functions.
Method 1: Concatenating Columns (String Columns)
import pandas as pd
df = pd.DataFrame({'FirstName': ['John', 'Jane'], 'LastName': ['Smith', 'root']})
# Combine columns into a new column
df['FullName'] = df['FirstName'] + ' ' + df['LastName']
print(df)
Output
FirstName LastName FullName 0 John Smith John Smith 1 Jane root Jane root
Method 2: Combining Numeric Columns (Mathematical Operations)
You can also combine numeric columns by performing arithmetic operations. For instance, you may want to calculate the total compensation of an employee by adding their Salary and a Bonus column (if present).
- Perform arithmetic operations like addition, subtraction, multiplication, etc.
import pandas as pd
df = pd.DataFrame({
'Salary': [50000, 60000, 70000],
'Bonus': [5000, 6000, 7000]
})
# Combine columns by performing arithmetic operations
df['Total Compensation'] = df['Salary'] + df['Bonus']
df['Salary After Tax'] = df['Salary'] - df['Salary'] * 0.2
df['Salary Times Bonus'] = df['Salary'] * df['Bonus']
print(df)
Output
Salary Bonus Total Compensation Salary After Tax Salary Times Bonus 0 50000 5000 55000 40000.0 250000000 1 60000 6000 66000 480...
Method 3: Using agg() function
The agg() function can also be employed to combine multiple columns into one. It provides a way to apply different aggregation functions simultaneously.
import pandas as pd
df = pd.DataFrame({'FirstName': ['John', 'Jane'], 'LastName': ['Doe', 'Smith'], 'Age': [28, 34]})
# Combine columns using agg() with a custom function
df['FullName'] = df[['FirstName', 'LastName']].agg(' '.join, axis=1)
print(df)
Output
FirstName LastName Age FullName 0 John Doe 28 John Doe 1 Jane Smith 34 Jane Smith
Method 4: Using apply() with Lambda Functions
The apply() function can be utilized alongside a lambda function to combine columns. This method is particularly useful for more complex combinations or when dealing with multiple columns:
- Use apply() with a lambda function to combine columns.
import pandas as pd
df = pd.DataFrame({
'First Name': ['John', 'Jane'],
'Last Name': ['Doe', 'Smith'],
'Age': [28, 34]
})
# First part: Combine 'First Name' and 'Last Name'
df['Message'] = df.apply(lambda row: f"{row['First Name']} {row['Last Name']}", axis=1)
# Second part: Add age-related information
df['Message'] = df['Message'] + df.apply(lambda row: f" is {row['Age']} years old.", axis=1)
print(df)
Output
First Name Last Name Age Message 0 John Doe 28 John Doe is 28 years old. 1 Jane Smith 34 Jane Smith is 34 years old.
Method 5: Using map()
The map() function can be used for combining columns by applying a function to each element of a column. For instance:
import pandas as pd
df = pd.DataFrame({'FirstName': ['John', 'Jane'], 'LastName': ['Doe', 'Smith']})
# Combine columns using apply() with a lambda function
df['FullName'] = df.apply(lambda row: row['FirstName'] + ' ' + row['LastName'], axis=1)
print(df)
Output
FirstName LastName FullName 0 John Doe John Doe 1 Jane Smith Jane Smith