In this article, we’ll explore different ways to create a new column in a Pandas DataFrame based on existing columns. This is a common task in data analysis when you need to transform or categorize your data.
Sample DataFrame:
import pandas as pd
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Event':['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost':[10000, 5000, 15000, 2000]})
print(df)
Output
Date Event Cost
0 10/2/2011 Music 10000
1 11/2/2011 Poetry 5000
2 12/2/2011 Theatre 15000
3 13/2/2011 Comedy 2000
1. Using the apply() Function
apply() function allows us to apply a custom function to each row or column. Here, we create a new column Discounted_Price by applying a 10% discount on the Cost column.
df['Discounted_Price'] = df.apply(lambda row: row.Cost -
(row.Cost * 0.1), axis = 1)
print(df)
Output
Date Event Cost Discounted_Price
0 10/2/2011 Music 10000 9000.0
1 11/2/2011 Poetry 5000 4500.0
2 12/2/2011 Theatre 15000 13500.0
3 13/2/2011 Comedy 2000 1800.0
Explanation:
- apply(): applies a function row by row (axis=1).
- lambda row: anonymous function for each row.
- row.Cost: gets value from Cost column.
- row.Cost * 0.1: calculates 10% discount.
- row.Cost - (row.Cost * 0.1): subtracts discount.
2. Element-wise Operation on Columns
Another simpler approach to create a new column is to perform an element-wise operation on an existing column. Here, we will directly apply the discount calculation to the Cost column.
import pandas as pd
df['Discounted_Price'] = df['Cost'] - (0.1 * df['Cost'])
print(df)
Output
Date Event Cost Discounted_Price
0 10/2/2011 Music 10000 9000.0
1 11/2/2011 Poetry 5000 4500.0
2 12/2/2011 Theatre 15000 13500.0
3 13/2/2011 Comedy 2000 1800.0
Explanation:
- df['Cost']: selects the Cost column.
- 0.1 * df['Cost']: calculates 10% of each cost (the discount).
- df['Cost'] - (0.1 * df['Cost']): subtracts the discount from the original cost.
- df['Discounted_Price'] = ... creates a new column called Discounted_Price with the discounted values.
3. Using map() Function
map() function is useful when you want to map one set of values to another. In this example, we’ll create a new column called salary_stats based on the salary column by using a mapping function.
def cost_category(value):
if value < 5000:
return "Low"
elif 5000 <= value < 12000:
return "Medium"
else:
return "High"
# Create a new column using map()
df['Cost_Category'] = df['Cost'].map(cost_category)
Output
Date Event Cost Cost_Category
0 10/2/2011 Music 10000 Medium
1 11/2/2011 Poetry 5000 Medium
2 12/2/2011 Theatre 15000 High
3 13/2/2011 Comedy 2000 Low
Explanation:
- df['Cost']: selects the Cost column.
- .map(cost_category): applies the cost_category function to each value in the Cost column.
- df['Cost_Category'] = ... stores the mapped results in a new column called Cost_Category.